T 

UNCLASSIFIED 

Technical  Report 
distributed  by 

Defense 
Technical 
Information 
Center 

Acquiring  Information  - 
Imparting  Knowledge 

Defense  Logistics  Agency 

Defense  Technical  Information  Center 

Cameron  Station 
Alexandria,  Virginia  22304-6145 


GCWi^ 


UNCLASSIFIED 


UNCLASSIFIED 


NOTICE 

We  are  pleased  to  supply  this  document  in  response  to  your  request. 

The  acquisition  of  technical  reports,  notes,  memorandums,  etc.,  is  an  active, 
ongoing  program  at  the  Defense  Technical  Information  Center  (DTIC)  that 
depends,  in  part,  on  the  efforts  and  interest  of  users  and  contributors. 

Therefore,  if  you  know  of  the  existence  of  any  significant  reports,  etc.,  that  are 
not  in  the  DTIC  collection,  we  would  appreciate  receiving  copies  or  information 
related  to  their  sources  and  availability. 

The  appropriate  regulations  are  Department  of  Defense  Directive  3200.12,  DoD 
Scientific  and  Technical  Information  Program;  Department  of  Defense  Directive 
5230.24,  Distribution  Statements  on  Technical  Documents  (amended  by  Secretary 
of  Defense  Memorandum,  18  Mar 1984,  subject:  Control  of  Unclassified  Technology 
with  Military  Application)',  American  National  Standard  Institute  (ANSI) 
Standard  Z39.18,  Scientific  and  Technical  Reports:  Organization,  Preparation, 
and  Production;  Department  of  Defense  5200. 1R,  Information  Security  Program 
Regulation. 

Our  Acquisition  Section,  DTIC-FDAB,  will  assist  in  resolving  any  questions  you 
may  have.  Telephone  numbers  of  that  office  are: 

(202)  274-6847,  (202)  274-6874  or  Autovon  284-6847,  284-6874. 

DO  NOT  RETURN  THIS  DOCUMENT  TO  DTIC 


EACH  ACTIVITY  IS  RESPONSIBLE  FOR  DESTRUCTION  OF  THIS 
DOCUMENT  ACCORDING  TO  APPLICABLE  REGULATIONS. 


UNCLASSIFIED 


IMAGE  UNDERSTANDING 


DECEMBER  1985 

Sponsored  by: 

Information  Processing  Techniques  Office 
Defense  Advanced  Research  Projects  Agency 


■"  ■'  ’  .  V.  ’  »  .  Jfl  /  ■■  1‘f- 

„  ..  .  1  i  -  ^  1  .  V  ■  V; *  ■  {  -  - '  - 

,  ,v  .*  >-A  V  ^  *  r  A,.-*,/.  .\t>  ■*;  A 


I  - -^gg*  -  /.•3feW^..i. - 

;?  *'  v  -  /  '  '  'Vs/  *  .v:*x .  , 

»  ’  '  •  %  '  t*  o*'/  »/  .  > , Jr  1  y  *••»*>  -  ■  -  £V-A. I 

1/  •  *  ‘  •  -  -\A  '  *>  '  »  V  '  f  tH'I 

^  ■-  .  -  .Tttfc  W 


-  •*'.<>.  •/.*:  .  '  tSJ'V  •’7  ••>  .,-nr  V  v-.' 

•■ .  j_ 

ms 


(BOSTON  LOGAN  AIRPORT) 


;^£LLCTE. 
Q£C  3  1  «86 


FINAL  OUTPUT 

(DETECTED  RUNWAYS) 


85  12  30  102 


Science  Applications  International  Corporation 


o 


IMAGE  UNDERSTANDING 


Proceedings  of  a  Workshop 
Held  at 
Miami  Beach,  Florida 
December  9-10, 1985 


Sponsored  by  the 
Defease  Advanced  Research  Projects  Agency 


Science  Application*  International  Corporation 
Report  Number  SAIG85/1 149 
Lee  S.  Baumann 
Workshop  Organizer  and 
Proceeding*  Editor 


Hue  report  na  supported  by 
The  Defense  Advanced  Research 
Project*  Agency  under  DARPA 
Order  No.  3466,  Contract  No.  MDA90344-C-01 60 
Monitored  by  the 
Defense  Supply  Service,  Washington,  D.C. 


approved  tor  pupuc  releasr 

DISTRIBUTION  LOOTED 


The  views  and  conclusions  contained  in  this  document  are  those  of  the  authon  and  ehovid  not  be  interpreted  as 
neoeeeeriiy  representing  the  official  policies,  ether  t  i^etutl  or  implied  of  the  Defense  Advanced  Research  Project* 
Agency  or  the  United  State*  Government 


UNCLASSIFIED 


security  classification  of  this  r age  ,wwi  o«*-»  bh?»f< 


• 

• 

REPORT  DOCUMENTATION  PAGE 

READ  INSTRUCTIONS 

BEFORE  COMPLETING  FORM 

I.  REPORT  NUMBER  t.  GOVT  ACCESSION  NO 

SAIC-S5/1149  A  \  lO 

J.  RECIPIENT'S  CATALOC  NUMBER 

*.  title  (m*  SuteftU) 

IMAGE  UNDERSTANDING 

Proceedings  of  a  Workshop,  December  1985 

»•  TYPE  O?" REPCRT  A  PEFtOD  COVERED 

ANNUAL  TECHNICAL 

October  1934  -  December  1985 

c.  PCProftMIN <2  OPG.  PttPONT  NUMBER 

7.  MJTHO*f«J 

LEE  S.  BAUMANN  (Ed.) 

(■  contract  or  grant  nunberc*) 

MDA903-84-C-0160 

PERFORMING  ORGANIZATION  NAME  ANO  ADDRESS 

SCIENCE  APPLICATIONS  INTERNATIONAL  CORPORATION 
1710  Goodridge  Drive,  10th  Floor 

McLean,  Virginia  22102 

10.  PROGRAM  ELEMENT.  PROJECT.  TASK 

AREA  *  WORK  UNIT  NUMBERS 

ARPA  ORDER  No.  3456 

>t.  CONTROLLING  OFFICE  NAME  ANO  AOORESS 

Defense  Advanced  Research  Projects  Agency 

1400  Wilson  Boulevard 

Arlington,  Virginia  22209 

11.  «WT  OAT«  } 

December  1985 

IS.  NUMBER  OF  PAGES 

515 

is.  monitoring  AGENCY  NAME  S  AOORESSfl/  S»nM  trmm  C<mtmlltn4  3  trie*  ) 

It.  SCCUftlTY  CLASS,  (mi  ttaia  rmmaH) 

UNCLASSIFIED 

i  rn'miMi 

1*.  DISTRIBUTION  ST  ATCMCNT  (at  thia 

APPROVED  FOR  PUBLIC  RELEASE;  DISTRIBUTION  UNLIMITED 

17.  Distribution  STatcmCnt  (•/  t*a  adatrami  m  Bimak  to,  if  different  trmm 

IS.  SUPPLEMENT ANY  NOT?!* 

fS-  KEY  WORDS  (Continue  on  rewmrve  ilWf  li  nmcaoaary  and  Identity  tf  alack  number) 

Digital  Image  Processing;  Image  Understanding;  Scene  Analysis;  Edge 

Detection;  Image  Segmentation;  CCDArrays;  CCD  Processors. 

A 

• 

20.  ABSTRACT  (C mi* trmm  am  rrnwmraa  a* da  it  lining)  mad  identity  ay  aimak  mamerr) 

This  document  contains  the  outlines  of  annual  progress  reports  and  technical 
papers  presented  by  the  research  activities  in  Image  Understanding,  sponsored 
by  the  Information  Processing  Techniques  Office;  Defense  Advanced  Research 
Projects  Agency.  The  papers  were  presented  at  a  workshop  conducted  on 

9-10  December  1985,  in  Miami  Beach,  Florida,  Also  included  are  copies  of 
invited  papers  presented  at  the  workshop  and  additional  technical  papers  from 
the  research  activities  which  were  not  presented  due  to  lack  of  time  but  are 
germane  co  this  research  field.  .  . 

DC  ,«Tn  1473  ccxnon  or  i  hov  ••  is  oosocerr 


UNCLASSIFIED 


SECURITY  CLASSJ/IC  ATtON  or  THIS  PAGE  o««  I»»l) 


TABLE  OF  CONTENTS 


FOREWORD  . 

DEFENSE  TEC BN 1  L  INFORMATION  CENTER  ACCESSION  NUMBERS 
AUTHOR  INDEX  . 


SECTION  I  -  PROGRAM  REVIEWS  BY  PRINCIPAL  INVESTIGATORS 

■Image  Understanding  Techniques  for 
Autonomous  Vehicle  Navigation*,  A.  Rosenfeld 
and  L.S.  Davis;  University  <£  Maryland  . 

■Imago  Understanding  Research  at  CMU*, 

T.  Kanade  and  S.  Shafer;  Carnegie-Mellon 
University  . 

*Im..ge  Understanding  Research  at  USC: 
1984-85*,  R.  Nevatia;  University  of 
Southern  California  . 

•Recent  Progress  of  the  Rochester  Image 
Understanding  Project*,  J.A.  Feldman  and 
C.M.  Brown;  Univernity  of  Rochester  . 

■Spatial  Understanding*,  T.O.  Binford; 
Stanford  University  . 

■HIT  Progress  in  Understanding  Images*, 

T.  Pcggio  and  the  staff;  Massachusetts 
Institute  of  Technology  . . . 

■The  SRI  Image  Undet standing  Research 
Program*,  M. A.  Fischler;  SRI  International  .. 

■Image  Understanding  Research  at  Columbia*, 
J.R.  Render;  Columbia  University  . 

•Summary  of  Progress  in  Image  Understanding 
ft  the  University  oc  Mssachuse tcs* . 

E.M.  Riseman  and  A...  lanson;  University 
of  Massachusetts  . 


TA3U2  or  COr<TSrTTS 


SECTION 


SECTION 


p.?a.g 

II  -  INVITED  TECHNICAL  REPORTS 


•Knowledge-Based  Interpretation  Aids  to 
the*  Navy  Oceanographic  Image  analyst", 

LCDR  J.D.  McKendrick  and  M.  Lybanon; 

Naval  Ocean  Research  and  Development 

Activity  .  61 

"The  BSPI  Vision  System",  T.C.  Rearick; 
Lockheid-Georgia  Company  .  64 

"Dynanic  Archival  Scene  Model",  H.N.  Nasr, 

R.K.  Vujgarwal  and  D.P.  Panda;  Honeywell 

Inc .  74 

"Image  Understanding  Research  at  General 
Electric",  J.L.  Nundy;  General  Electric  .  83 

•Structure  a. id  Motion  Prom  Ikjges", 

J.K.  Aggarwal  and  A.  Nitiche;  The  University 
of  Texas  at  Austin  .  89 

"Surfaces  Proa  Stereo",  W.  Hoff  and  N.  ahuja; 
University  of  Illinois  . ...  98 


II  -  TECHNICAL  REPORTS  PRESENTED 


"Structure  from  Motion  Without  Correspon¬ 
dence:  General  Principle",  K.  Kanatani; 

University  of  Maryland  .  107 

"Robust  E  "imation  of  3-D  Motion  Param¬ 


eters  from  a  Sequence  of  Image  Prames 
Using  Regularization",  G.  Medioni  and 
Y.  Yasumoto;  University  of  Southern 
California  .  117 

"Contour,  Orientation  and  Motion", 

J.  Aloimonos.  A.  Basu  and  C.M.  Brown; 

University  of  Rochester  . . 129 

"Eplpolar-Plane  Image  Analysis:  A 
Technique  for  Analyzing  Motion  Sequences", 

R.C.  Bolles  and  4.H.  Baker;  SRI  Inter¬ 
national  . 137 


■  .'tvi  v:  marriage:  >  naocusav  -V 


i/VUVmi  -"vl“i 


TABLE  OF  CONTENTS 


SECTION  III  -  TECHNICAL  REPORTS  PRESENTED  (Continued) 
"SRI * s  Baseline  Stereo  System",  M.J.  Hannah ; 


SRI  International  .  149 

"Concurrent  Multilevel  Relaxation", 

D.  Terzopoulos;  Massachusetts  Institute 
of  Technology  .  5  56 

"Trinocular  Vision  Using  Photometric  and 
Edge  Orientation  Constraints",  V.J. 

Milenkovlc  and  T.  Ranale;  Carnegie-Mellon 
University  .  163 

"Edgel-Aggregation  and  Edge-Description", 

V.S.  Nalwa  and  E.  Pauchon;  Stanford 

University  .  176 

•introducing  a  Smoothness  Constraint  in 
a  Hitching  Approach  for  the  Computation  ot 
Displacement  Fields",  P.  Anandan  and 
R.  Weiss;  University  of  Massachusetts  .  186 

"On  Surface  Reconstruction  Using  Sparse  O 

Depth  Data",  T.8.  Boult  end  J.R.  Render; 

Columbia  University  .  197 

"L&oelling  Line  Drawings  of  Curved  Objects", 

J.  Malik;  Stanford  University  .  209 

"fiolvinq  the  Depth  Interpolation  Preble* 

with  the  Adaptive  Chebyshev  Acceleration 

Method  on  a  Parallel  Computer",  D.J.  Choi 

and  J.R.  Render;  Columbia  University  .  219 

"Pirst  Results  on  Outdoor  Scene  Analysis 
Using  Range  Data",  M.  Hebert  and  T.  Ranade; 
Carnegie-Mellon  University  .  224 

"Description  of  Surfaces  fro*  Xange  Data", 

T.J.  Pan,  G.  Medioni  and  R.  Nevada; 

University  of  Southern  California  .  232 

"Disparity  Punctionals  and  Stereo  Vision", 

R.D.  Eastman  and  A.M.  Waxman;  University 
of  Maryland  . 245 


:|VV*J 
.  J.v.  J 

m 


i:v?. 


C a  ■" a" ) 


« 


TABLE  CPC0NTENT3 


Page 

SECTION  III  -  TECHNICAL  REPORTS  PRESENTED  (Continued) 

"Evidence  Combination  for  Vision  using 
Likelihood  Generators",  D.  Sher;  Uriversify 
of  Rochester  .  2  55 

"Locating  Cultural  Regions  i r.  Aerial 
Imagery  Using  Geometric  Cues",  ?.  Pua  and 

A. J.  Hanson;  SRI  International  . 271 

"The  Information  Pr.sion  Problem  and  Rule- 
Based  Hypotheses  Applied  to  Complex 
Aggregations  of  Image  Events",  R.  Belknap, 

E.  Ri3eman  and  A.  Hanson;  University  of 
Massachusetts  .  279 

SECTION  IV  -  TECHNICAL  PAPERS  NOT  PRESENTED 

"Probabilistic  Solution  of  Ill-Posed  Problems 
in  Computational  Vision",  J.  Marroquin, 

S.  Hitter  and  T.  Poggio;  Massachusetts 
Institute  of  Technology  .  293 

"Stereo  Verification  in  Aerial  Image 
Analysis",  D.H.  McKeown.  C.A.  McVay  and 

B. D.  Lucas;  Carnegie-Mellon  University  .  310 

"The  Terrain-Calc  System",  L.H.  Quara; 

SRI  Internat'  .nal  .  327 

"Converting  Feature  Values  to  Evidence", 

G.  Reynolds,  D.  Strahman  and  N.  Lehrer* 

University  of  Massachusetts  at  Amherst  .  331 

"Binocular  Image  Plows",  A.M.  Waxman  and 

J.H.  Duncan;  University  of  Maryland  and 

Plow  Research  Company . .  34 J 

"Oet'„'tinq  Structure  in  Random-Dot 

Patterns",  R.  Vistnes;  Stanford  University  ...  350 

"One-Eyed  "tereo:  A  General  Approach  to 
Modeling  3-D  Scene  Geometry",  T.M.  Strat 
and  N.A.  Piarhler:  SRI  International  ... 


363 


J.k".  «.H3A'  v;;*vr"T-  ,'tV  T 


TABLE  OF  CONTENTS 


Page 

SECTION  IV  -  TECHNICAL  PAPERS  NOT  PRESENTED  (Continued) 

•Stereo  Correspondence:  Features  and 
Constraints”,  H.S.  Lim  and  T.O.  Binford; 

Stanford  University  .  373 

•Direct  Passive  Navigation:  Analytical 
Solution  £or  Planes",  S.  Negahdaripour  and 
B.K.P.  Horn:  Massachusetts  Institute  of 
Technology  . .  381 

•Analysis  of  an  Algorithm  for  Detection 
of  Translational  Motion”,  I.  Pavlin, 

Z.  Riseraan  and  A.  Hanson:  University  of 
Massachusetts  .  388 

’Inherent  Ambiguities  in  Recovering  3-0 
Motion  and  Structure  from  a  Noisy  Flow  Field”, 

G.  Adiv;  University  of  Massachusetts  .  399 

•Refinement  of  Environmental  Depth  Maps 

Over  Multiple  Fra.ne*",  S.  Bharvani,  E.  Riseman 

and  A.  Hanson;  University  of  Massachusetts  ...  413 

•Multiresolution  Path  Planning  for  Mobile 
Robots’,  S.  Kambhs-apat  i  and  L . S.  Davis: 

University  of  Maryland  . 421 

•Error  Detection  and  Correction  for  Stereor , 

R.  Mohan;  University  of  Southern  California  ..  433 

"Geometric  Grouping  of  Straight  Lines', 

R.  Wei3.,  A.  Hanson  and  E.  Riseman; 

University  of  Massachusetts  . 443 

’Or>  Jetectinq  Edges",  V.S.  Nalwa  and 

T.O.  Binford;  Stanford  University  .  450 

•Visual  Surface  Interpolation:  A  Comparison 
of  Two  Methods",  T.E.  Boult;  Columbia 
University  .  466 

"Predicting  Specular  Features",  G.  Healey 

aiv,  T.O.  Binford;  Stanford  University  .  479 


TAB LB  OF  CONTESTS 


Page 

SECTION  IV  -  TECHNICAL  PAPERS  NOT  PRESENTED  (Continued) 


"A  Provably  Convergent  Algorithm  for  Shape 
from  Shading".  D.  Lee;  AT&T  Bell  Labora¬ 
tories  .  489 


"Generalized  Cone  Descriptions  from  Sparse 
3-D  Data",  K.G.  Rao  and  R.  Ne/atia; 

University  of  Southern  California  .  497 

"Equivalent  Descriptions  of  Generalized 

Cylinders",  K.i>.  Roberts;  Columbia 

University  .  506 

"The  Calibrated  Imaging  Lab  Under 
Construction  at  CMU" ,  S.A.  Shafer;  Carnegie- 
Mellon  University  .  509 


■A P 'A  AJLVA?. 


•s.  --'_  *  -  _  •_. 


FOREWORD 


The  Sixteenth  Image  Understanding  Workshop  was  held 
in  Miami  Beach,  Florida  on  September  9-10,  1985.  This  work- 
snop,  attended  by  more  than  one  hundred  research  and  govern¬ 
ment  personnel  was  the  first  conducted  by  the  new  Defense 
Advanced  Research  Projects  Agency  Program  Manager  for  IU,  LTC 
Robert  L.  Simpson,  Jr.  Following  his  welcoming  remarks,  LTC 
Simpson  presented  his  views  of  the  state  cf  the  DARPA 
research  program  for  Image  Understanding.  The  following 
paragraphs  summarize  LTC  Simpson's  views  under  the  title, 
*Look  Back,  Look  Forward." 

As  the  new  program  manager  of  the 
DARPA  Image  Understanding  Program,  I 
believe  it  is  safe  to  say  that  we  have 
arrived  at  an  important  stage  in  the 
historv  of  this  impressive  research  pro¬ 
gram.  Looking  back,  we  can  see  the 
evolution  of  image  understanding  and  its 
impact  on  defense  capabilities.  Origin¬ 
ally  conceived  as  a  five  year  program  in 
1975  by  Lt  Col  David  Carlstroo,  the  first 
several  years  of  IU  established  the 
strong  base  of  low-level  vision  tech¬ 
niques  and  knowledge-based  sub-systems 
that  began  to  differentiate  the  program 
from  what  is  usually  called  "image  pro¬ 
cessing".  In  the  late  70*  s  and  early 
80' s,  under  the  direction  of  Lt  Col  Larry 
Druffei,  the  program  saw  the  development 
of  model-based  vision  systems  3uch  as 
ACRONYM  and  the  demonstration  of  IU  tech¬ 
niques  in  more  meaningful  concept  demon¬ 
strations  such  as  the  DARPA/DMA  image 
understanding  testbed.  These  demonstra¬ 
tions  and  their  potential  for  future 
military  use  warranted  the  continuation 
of  the  IU  program  beyond  its  initial  five 
year  lifespan.  Under  CDR  Ron  Ohlander, 

IU  technology  continued  to  mature  to  the 
point  that  the  DARPA  Strategic  Computing 
Program  could  justify  a  major  applica¬ 
tion,  the  autonomous  land  vehicle. 

The  basic  theme  of  the  IU  program 
has  remained  the  same  as  quoted  by  Larry 
Druffei  in  1981: 


\'  \ 
i.*V  i. 


to  investigate  application 
of  a  priori  knowledge  to 
facilitate  an  understanding  of 
the  relationship  among  objects 
in  a  scene.  The  appropriate 
focus  is  on  the  world  under¬ 
standing.  ... [The  Image  Under¬ 
standing  Program]  is  a  catalyst 
which  attempts  an  integration 
of  many  sciences  [image  pro¬ 
cessing,  pattern  recognition, 
computer  science,  artificial 
intelligence,  neurophysiology, 
and  physics]  in  search  of 
methods  for  automatic  extrac¬ 
tion  of  information  from  imag¬ 
ery.  (Druffel,  1981,  pp.  2-3) 

Looking  to  the  future,  we  need  to 
identify  the  real  defense  applications 
that  have  been  made  possible  by  the  basic 
IU  research.  Otherwise,  we  will  have 
difficulty  justifying  the  continuation  of 
IU.  For  those  of  you  readers  in  the 
Department  of  Defense,  I  need  to  know 
your  assessment  of  the  value  of  IU  to  you 
and  your  mission.  I  solicit  your  evalua¬ 
tion  and  suggestions. 

One  major  new  thrust  to  cpme  from 
the  IU  foundation  has  been  the  autonomous 
vehicle  application  of  computer  vision 
technology  in  DARPA ' 3  Strategic  Computing 
Program.  While  representing  an  important 
iir’w  showcase  for  IU  technology,  we  need 
to  carefully  assess  the  importance  of 
basic  high  risk  research  in  generic  image 
understanding  as  a  separate  program  from 
other  DARPA  initiatives  such  aa  SC. 

There  remains  much  to  accomplish 
before  we  can  perform  visual  information 
processing  in  problem  solving  context  at 
the  same  or  greater  capability  of  human 
beings.  This  continues  to  be  our  vision 
for  the  future. 


ii 


The  purpose  of  the  workshop  was  to  present  the 
current  research  results  and  on-going  efforts  to  the  commun¬ 
ity  as  a  whole  and  to  foster  an  interchange  of  technical 
discussions  leading  toward  improved  communications  and  wider 
utilization  of  mature  technology.  This  year  the  workshop 
featured  a  new  session  consisting  of  invited  papers  by  other 
than  DARPA  sponsored  researchers  in  an  effort  to  broaden  the 
horizons  of  the  group  as  a  whole.  This  proceedings  consists 
of  the  P.I.  program  reviews  in  Section  I,  the  invited  papers 
in  Section  II,  and  the  technical  papers  presented  at  the 
workshop  in  Section  III.  At  the  request  of  the  program 
manager,  other  technical  papers  for  which  time  was  not  avail¬ 
able  for  presentation  are  included  as  Section  IV  so  that  the 
document  can  serve  the  community  as  a  more  comprehensive 
reference  for  research  papers  on  this  research  area. 

This  proceedings  has  been  supplied  to  the  Defense 
Technical  Information  Center  (OTIC)  and  copies  nty  be  secured 
from  that  Agency  by  writing  to  the  following  address.: 

© 

Defense  Technical  Information  Cents* 

Cameron  Station,  Bldq.  *5 
Alexandria,  VA  22314 

A  small  charge  is  assessed  by  the  DTIC  for  repro¬ 
duction  expenses.  Accession  number  for  this  proceedings  is 
not  yet  available  but  will  be  assigned  by  the  UTIC  within  the 
next  thirty  days.  Accession  numbers  for  previous  issues  are 
listed  on  the  following  page. 

The  pictures  on  the  cover  were  provided  oy  the 
Intelligent  Systems  Group,  Departments  of  Electrical  Engi¬ 
neering  and  Computer  Science,  University  of  Southern 
California,  from  one  of  their  research  efforts.  Dr.  Ram 
Nevatia,  Principal  Investigator  for  the  DARPA  image  research 
at  USC,  described  the  sequences  as  follows: 

The  pictur.  .  on  the  cover  are  from  a 
project  in  making  maps  from  aerial  images 
at  the  University  of  Southern  California 
(USC).  This  task  focuses  on  detecting 
runway  structures  in  airport  images.  The 
front  cover  shows  part  of  an  aerial  image 
of  the  Boston  Logan  Airport.  Detection 
of  runways  in  this  image  is  a  seemingly 
simple  task  for  us,  but  In  fact  has  many 
complexities.  The  runways  have  many 
markings,  oil  and  tire-tread  marks  and 


surface  material  is  not  always  homoge¬ 
neous.  The  lower  picture  on  the  front 
cover  shows  the  pieces  of  runways  detect¬ 
ed  by  the  program  after  a  sequence  of 
processing. 

The  back  cover  shows  the  results  at 
suae  of  the  intermediate  ste>.3.  The 
first  figure  on  the  back  shews  the  line 
segments  detected  in  the  image.  The 
comp.'.exity  of  the  task  becomes  much  more 
apparent  in  this  figure  than  in  the  ori¬ 
gins;!  image.  Next,  the  segments  are 
organized  in  "anti-parallel"  pairs  of 
lines  (lines  that  are  parallel  but  of 
opposite  contrast)  based  on  the  observa¬ 
tion  that  runways,  taxiways,  roads  etc. 

.-.re  characterized  by  such  lines.  The 
diddle  figure  on  the  back  cover  shows  the 
■enti-parallelo  of  a  specific  width  and 
direction  corresponding  to  the  expected 
width  and  the  directions  of  the  runways. 

The  next  step  j.s  to  connect  the  anti¬ 
parallel  pairs  based  on  a  variety  of 
evidence-  first,  the  very  strong  evidence 
such  as  sharing  segments  and  strict  col- 
linearity  (giving  the  last  figure  on  the 
back)  and  then  much  weaker  evidence  lead¬ 
ing  to  the  figure  on  the  front  cover. 

We  would  like  to  point  out  that  the 
pictures  shown  represent  work  in  progress 
demonstrating  the  feasibility  of  the 
chosen  task  rather  than  a  final,  finished 
algorithm  to  perform  the  runway  descrip¬ 
tion  task. 

The  artwork  and  layout  designs  were  done  by  Mr.  Tom 
Dickerson  of  the  SAIC  graphics  staff.  Significant  assistance 
in  handling  the  mailings,  and  in  putting  the  proceedings 
together  for  publication  was  provided  by  Barbara  Burkett- 
Mary  Hollingsworth  and  Dianne  Williams  of  the  Science  Appli¬ 
cations  International  Corporation  administrative  staff. 
Their  hard  work  and  diligence  made  possible  the  organization 
of  the  workshop  and  the  ability  to  get  the  proceedings  papers 
to  the  printers  on  schedule. 


bee  S.  Baumann 
Science  Applications 
International  Corporation 
Workshop  Organizer 


ISSOE 

APRIL 

OCTOBER 

HAT 

NOVEMBER 

APRIL 

NOVEMBER 

APRIL 

APRIL 

SEPTEMBER 

JUNE 

OCTOBER 


DEFENSE  TECHNICAL  INFORMATION  CENTER 

- n.V ng?Tr 5W~  mpmbeSs  for  previous 

- I.t).  w6feKgE5fg 


DATE 

NUMBER 

1977 

AD 

A052900 

1977 

AD 

A0S2901 

19/8 

AD 

A0S2902 

1978 

AD 

A064765 

1979 

AD 

A069515 

1979 

AD 

A077568 

1980 

AD 

A084764 

1931 

AD 

A098281 

1982 

AD 

A120072 

1983 

AD 

A130251 

1984 

AD 

A149496 

U 


s . 

*7 


v 


VI 


AUTHOR  iron  (Continued! 


NAME 
Hannah,  M.J. 
Hanson,  A.J. 
Hanson,  A.R. 
Healey,  G. 
Hebert,  M. 

Hoff,  W. 

Horn,  B.R.P. 
Raabhanpati ,  8. 
Ranade,  T. 
Kanatani,  K. 
Render,  J.R. 

Lee,  0. 

Lehrer,  N. 

Lim,  H . S . 

Lucas,  B.D. 
Lybanon,  N. 
Malik,  J. 
Marroquin,  J. 
McRendrick,  J.D. 
Me  Reown ,  O.M. 
HcVay,  C.A. 


PAGE 

149 

271 

48,  279,  388,  413,  443 
479 
224 
98 
381 
421 

4,  163,  224 
107 

45,  197,  219 
489 
331 
373 
310 
61 
209 
293 
61 
310 
310 


Sr 


Vll 


APTHOa  iron  (Continued) 


r-J 


Nevati a.  B. 
Panda,  D.P. 
Rauchon,  8, 
Pavlin,  I. 
Poggio,  T. 
Quam,  L.H. 
Rao,  K.G. 
Rearick,  T.C. 
Reynolds,  G. 
Risemin,  E.M. 
p.obertr,  K.S. 
Rosenfcld,  A, 


10,  232,  497 
74 
176 
388 

25,  293 
327 
497 
64 
331 

48,  279,  388,  413,  443 

506 

1 


ft 

■■J 

S3 

a 


NAME 

PAGE 

*  4 

117,  232 

'/A 

Medioni,  G. 

Milenkovlc,  V.J. 

163 

i*  *• 

89 

Mitlche,  A. 

•‘v 

293 

* 

Micter,  S. 

433 

Moran,  R. 

Handy,  J.L. 

63 

•*  ^ 

176,  450 

si 

Ni.lwa,  V.S. 

P" 

74 

Sj  a  r ,  H  •  H  • 

Nega^daripour,  S. 

381 

*•  .*• 
>V- 

e 

v\ 

c 


*-  .• 
*  '  a 


.-3 


si 


r‘ 


y- 


r>v 


O 


«  .*  -  jr  T 


•  •  **  • 


ADTHoa  IBDP  (Continued) 


HAWK 

PACK 

Shafer,  S.A. 

4, 

509 

•.  ,1 

Sher,  D. 

255 

i*  /, 
•\>\ 

Strahaan,  0. 

331 

lU 

Strat,  T.H. 

363 

u 

Terzopouloa,  0. 

156 

Viatnea,  R. 

350 

-  -  V 

Hainan,  A.H. 

245, 

340 

& 

Weias,  R. 

186, 

443 

9 

Yasuaoto,  Y. 

117 

’  m 

• .  • 

.•  / 


Sv 


SECTION  I 


PROGRAM  REVIEWS  BY 
PRINCIPAL  INVESTIGATORS 


i 


I 

i 


*  v*. 


CiL  9  76/95  C)i 


IMAGE  UNDERSTANDING  TECHNIQUES 
FOR  AUTONOMOUS  VEHICLE  NAVIGATION 

Airiel  Rosen:'? Id 
Larry  S.  Davis 

Center  for  Automation  Research 
University  of  Maryland 
College  Park,  MD  207 42 


ABSTRACT 

This  report  briefly  summarizes  researro  carried  out  during 
the  period  September  1084 -October  1085  on  Contract 
DAAK70-83-K-0018  (DARPA  Order  3208).  The  focus  of 
this  research  is  on  image  understanding  techniques  appli¬ 
cable  to  autonomous  land  vehicle  navigation.  Particular 
emphasis  bas  been  placed  on  time-varying  imagery 
analysis,  but  some  work  has  also  been  done  on  stereopsis 
and  on  three-dimensional  scene  geometry. 


1.  INTRODUCTION 

The  Computer  Vision  Laboratory  of  the  Center  for 
Automation  Research  at  the  University  of  Maryland  has 
been  funded  under  the  DARPA  Image  Understanding  Pro¬ 
gram  since  1078.  The  current  contract,  entitled  “Auto¬ 
nomous  Vehicle  Navigation",  was  initiated  in  December 
1082.  It  was  funded  through  the  U.S.  Army  Night  Vision 
and  Electro-Optics  Laboratory  in  Fort  Belvoir,  VA  under 
Contract  DAAK7O-83-K-O0I8,  with  Dr.  George  Jones  as 
COTR. 

The  research  being  conducted  under  the  contract  is 
concerned  with  image  understanding  techniques  for  auto¬ 
nomous  land  vehicle  navigation.  (Our  Laboratory  is  also 
funded  under  the  DARPA  Strategic  Computing  Program 
to  develop  algorithm  s  for  road  network  following  and  obs¬ 
tacle  avoidance,  to  t  •  '  rated  on  the  DARPA  Autonomous 
Land  Vehicle;  this  v»-rk  will  not  be  described  here.) 
Emphasis  has  been  ;  .seed  on  techniques  for  time-varying 
imagery  analysis,  but  some  work  has  also  been  done  on 
stereopsis  and  on  tbree-Jimensional  scene  geometry. 
Specific  accomplishments  in  each  of  these  areas  are 
described  in  the  following  section.-. 


2.  LAND  VEHICLE  NAVIGATION 

Several  studies  were  conducted  on  tile  project  that 
are  relevant  to  the  problem  of  land  vehicle  navigation; 
they  are  described  in  the  following  peragrapns. 

A  methodology  for  landrr:ark-b.ised  ven!  'i  p-wition 
determination  is  developed  in  [l|.  This  report  describes  a 


system  by  which  an  autonomous  land  vehieie  might 
improve  its  estimate  of  its  current  position.  This  system 
sel'cts  visible  landmarks  from  a  database  ot  knowledge 
about  its  environment  and  controls  a  camera's  direction: 
and  focal  length  to  obtain  images  of  these  landmarks. 
The  landmarks  are  then  located  >n  the  images  using  s 
modified  version  of  the  generalized  Hough  transform  and 
their  locations  are  used  to  triangulate  to  obtain  the  new 
estimate  of  vehicle  position  and  position  uncertainty. 

Another  study  is  concerned  with  simulation  of  road 
images  as  seen  from  a  vehicle  [2].  A  model  of  a  road  is 
proposed  which  incorporates  parameters  that  specify  the 
cv/vature  and  slope  of  the  road.  Images  synthesized  using 
this  model  appear  quite  natural.  This  shows  that  it  would 
be  sufficient  to  extract  these  parameters  in  order  to  obtain 
3D  information  about  the  road  in  road  image  analysis. 

Methods  of  color  image  smoothing,  segmentation  and 
edge  detection  are  developed  in  [3]  in  connection  with 
extendieg  road  scene  analysis  algorithms  color  imagery.  A 
new  measure  of  edge  information  for  color  Images  based 
on  cumulative  histograms  of  absolute  color  differences  is 
proposed.  A  multispectral  version  of  the  Symmetric 
Nearest  Neighbor  filter  for  edge- preserving  smoothing  and 
methods  for  image  segmentation  and  edge  detection  are 
developed  based  on  this  measure.  Experimental  results 
show  that  the  performance  of  the  new  algorithms  is  good. 

Another  report  desls  with  path  planning  for  mobile 
robots  [8].  The  problem  of  automatic  co'lision-frev*  path 
planning  is  central  to  mobile  robot  applications.  In  this 
report,  we  present  an  approach  to  automatic  path  plan¬ 
ning  based  on  a  quadtree  representation.  We  introduce 
hierarchical  path  searching  methods,  which  make  use  of 
*.his  multiresolution  representation,  to  speed  up  the  path 
planning  process  considerably.  Finally,  we  discuss  the 
applicability  of  this  approach  to  mobile  robot  path  plan¬ 
ning.  This  report  is  included  in  the  Proceedings  of  this 
Workshop. 


S.  TIME- VARYING  IMAGERY  ANALYSIS 

The  Ph.D.  dissertation  of  Kwar.gyoen  Wohn,  under 
the  direction  of  Allen  M.  Waxman,  dealt  with  the 
analysis  of  image  flows  produced  by  the  motions  of  planar 


1 


;•»  »  %;•*  ,"v 


or  curved  surfaces.  In  the  first  part  of  this  work,  reported 
last  year,  we  developed  an  aigseithm.  t  he  Velocity  Func¬ 
tional  Method,  to  recover  an  image  How  field  from  time- 
varving  contours.  The  method  tollcws  directs  from  the 
analytic  structure  of  the  underlying  image  flow;  no  heuris¬ 
tics  are  imposed.  Local  image  How  is  modeicd  as  a 
second-order  Taylor  series.  The  method  computes  twelve 
series  coefficients  from  the  normal  component  of  image 
flow  measured  along  contours.  For  planar  surfaces  in 
motion,  the  method  yields  the  exact  flow.  We  hi.-e 
demonstrated  the  robustness  of  our  algorithm  by  carrying 
cut  the  sensitivity  analysis  in  the  context  of  planar  sur¬ 
faces  executing  general  rigid  body  motions  in  space. 

The  second  part  of  Dr.  Wohn's  dissertation  [7] 
explores  the  additional  aspects  of  the  theory  for  curved 
surfaces,  where  the  second-order  flow  approximation  is 
only  locally  v,.iid.  We  derive  the  dependence  of  the  trun¬ 
cation  er-or  on  surface  curv'.ture  and  field  of  view.  We 
also  investigate  the  sensitivity  of  solutions  to  noise  in  the 
normal  flow.  The  combined  algorithms  of  2-D  flow  esti¬ 
mation  and  3-D  structure  and  motion  recovery  are  not  as 
stable  to  input  noise  and  surface  structure  as  is  the  case 
for  planar  surfaces.  The  use  of  multiple  frames  to  over- 
come  the  effects  of  noise  is  currently  under  study. 

The  Ph.I).  dissertation  of  Murnlidhara  Suhbarao.  also 
under  the  direction  of  Dr.  Waxman.  continues  the  work 
im  image  flow  analysis.  [Dr.  Waxman  is  now  at  Thinking 
Machines  Corporation,  hut  he  continues  to  direct  Ph.D. 
research  at  the  Cniversity  of  Maryland.]  Ir.  the  first  part 
of  this  research,  two  results  on  the  uniqueness  of  image 
flow  solutions  for  planar  surfaces  in  motion  have  been 
developed  [dj.  The  first  result  concerns  resolving  the  dual 
it y  ,,f  interpretations  that  are  generally  associated  with 
1 1;«*  in.-tantaneous  image  flow  of  an  evolving  image 
sequence.  It  is  shown  that  the  interpretation  for  orienta¬ 
tion  and  motio.l  of  planar  surfaces  s  unique  when  either 
two  successive  image  flows  of  one  planar  surfac*  patch  aie 
given  or  one  image  flow  of  two  planar  patches  moving  as 
a  rigid  body  is  given.  We  have  proved  tins  hy  deriving 
explicit  expressions  for  the  evolving  s  lution  of  an  image 
flow  sequence  with  time.  These  expressions  ran  lie  used 
to  resolve  this  ambiguity  of  into pretation  .n  practical 
problems.  The  seeond  result  is  the  proof  of  uniqueness  for 
the  velocity-  ,,f  approa-h  v  hieh  satisfies  the  image  flow 
equations  for  planar  surfaces,  in  addition,  it  is  shown 
that  this  velocity  can  f '■  compute  t  the  middle  root  <  a 
cubic  equation.  These  tw  i  results  together  snggt  -l  a  new 
met  hoil  for  solving  tiv  image  flow  problem  for  (  lanar  sur¬ 
faces  in  motion. 

Dr  Waxman  and  Dr.  .lames  11.  Duncan  have  colla¬ 
borated  on  a  study  of  binocular  image  flow,  as  a  step 
toward  unifying  stereopsis  and  motion  analysis  (.n.  The 
analyses  of  visual  data  by  stereo  and  motion  , nodules 
have  typically  been  treated  as  separate,  parallel  processes 
which  both  feed  a  common  viewer-centered  l.r- \>  sketch 
of  the  scene.  When  acting  separately,  'tens)  and  motion 


analy~.es  are  subject  to  certain  Inherent  difficulties-  stereo 
must  resolve  a  combinatorial  correspondence  probhm  and 
is  further  complicated  by  the  presence  of  occluding  boun- 
dares;  motion  analysis  involves  the  solution  of  nonlinear 
equations  and  yields  a  3-D  interpretation  specified  up  to 
an  undetermined,  scale  factor.  A  new  module  is  described 
here  which  unifies  stereo  and  motion  analysis  in  a  manner 
in  which  each  helps  to  overcome  the  otner's  shortcomings. 
One  important  result  is  a  correlation  betu-een  relative 
image  flow  ,T  e  ,  binocular  difference  flow)  and  stereo 
odispan ■»;  it  points  to  the  importance  of  the  ratio  t/6,  rate 
cf  change  cf  disparity  6  to  disparity  6.  and  its  possible  role 
in  establishing  stereo  correspondence.  This  report  is 
included  in  the  Proceedings  of  this  Workshop. 

Dr.  Ken-ichi  Kanatani  of  Gunma  University,  Japan, 
who  is  currently  at  our  Center,  is  doing  research  on  vari¬ 
ous  aspects  cf  time-varying  imagery  analysis  and  three- 
dimensional  vision.  His  first  report  on  this  project  deals 
with  the  determination  of  3D  scene  stnicture  and  motion 
frorr.  an  image  sequence  without  the  need  for  point-to- 
point  correspondence.  The  procedure  consists  of  two 
stages:  (i)  determination  of  the  flow  paiametcrs.  which 
completely  characterize  the  motion  of  the  planar  part  of 
the  object,  and  (ii)  computation  of  3D  recovery  from  these 
flow  parameters.  The  first  stage  is  dope  by  measuring 
features  of  the  image  sequence.  The  second  stage  is 
analytically  expressed  in  terms  of  invariants  with  respect 
to  coordinate  changes.  Typical  features  and  relations  to 
stepwise  tracing  are  also  discussed.  This  reoort  is  also 
includ'd  in  the  Proceedings  of  this  Workshop. 


4.  THREE-DIMENSIONAL  SCENE  ANALYSIS 

The  Ph.D.  dissertation  of  Roger  D.  Lasttnan,  also 
under  the  direction  of  Dr.  Waxman,  deals  with  an 
approach  to  stereopsis  based  on  tie  analysis  of  disparity 
fields  N).  The  first  part  cf  his  research  investigates  stereo 
matching  constraints  that  derive  from  -in  analytic  model 
of  surface  depth.  Computational  stereo  i.«  formulated  as  a 
single  stage  process  in  which  potential  feature  point  or 
ton<a\  ;  natcl.es  interact  to  provi  !.  "ipport  for  local  esti¬ 
mates  ■  d  a  polynomial  model  of  ulspanty  (the  disparity 
function  al),  not.  hist  estimates  of  disparity  at  isolated 
points.  A"  algorithm  is  presented  that  Integrates  the 
lisp  -ri:y  f u net b a,: ,J  with  multiresolntion  matching  of 
zer-ei  .'-lncs  t.)  deri-e  depth  of  surface  patches.  The 
:x n :i : v t i . - i : y  tie-  T  parity  field  is  thereby  exploited  early 
in  the  matching  process,  and  yields  surface  reconstruction 
as  a  direr!  hypr .duct  of  uorrespondenee.  This  report  is 
al-r  included  in  tin  Proceedings  of  this  Workshop. 

..\liol  her  'lereopsis  technique,  based  on  the  use  of 
three  cameras,  is  described  in  jfij.  A  three-camera 
approach  for  comput  at  tonal  stereo  is .  presented,  w  hich 
greatly  r.iuipiities  t  lie  sear-h  problem  among  candidate 
matches  and  allows  matching  of  horizontal  edges.  Only  a 
simple  camera  geometry  is  r  .nsfijered.  in  which  the 
images  tire  rectified  in  ill"  same  (lane.  I  he  horizontal 


'arVJJJ  Ik*  V.'a:v_V  H _ ■glrt'i-tiaexvjrt JTS  •  xz  rrat^jrarr^'i.-j  m  :r_jjirijjM  ~  \r~rw  • .« 


and  vertical  !mnsco  are  equidistant  from  and  aligned 
parallel  to  the  base  image.  The  primiti'":  object  cf  the 
approach  are  labeled  edge  segments,  i.e.,  8-connected 
chains  of  edge  point'  with  ‘.Heir  local  image  properties. 
The  matching  algouthm  sc  .ins  through  the  edge  segments 
in  the  base  image  and  seiches  for  corresponding  triples  of 
points  in  the  three  images.  Local  properties  of  points  are 
used  to  classifv  matches.  A  preliminary  evaluation  ol 
matches  u>  based  on  goodness  of  match  criteria.  A  simple 
postprocessing  method  based  on  contour  connectivity  Is 
used  to  eliminate  false  matches.  The  method  performs 
vrel!  in  experiments.  The  basic  matching  algorithm  gen¬ 
erates  only  a  few  false  matches  and  mast  of  these  can  be 
easily  eliminated. 

A  general  discussion  of  projective  geometry  and  its 
use  in  three-dimensiona1  scene  analysis  is  presented  in 
[10].  Geometric  properties  are  of  key  importance  in  the 
recovery  of  scene  structure  from  images.  Jt  is  argued  that 
the  proper  formulations  of  the  determination  of  scene 
geometry  are  obtained  when  projective  geometry  is  used. 
A  framework  of  projective  geometry  foi  computer  vision 
is  presented  in  brief  and  its  applicability  is  demonstrated 
in  a  simple  example.  A  computational  approach  to 
finding  the  nectssary  primitives  is  reviewed. 

REFERENCES 

1.  Frederick  P.  Andresen  and  Larry  S.  Dawis,  “Visual 
position  determination  for  autonomous  vehicle  navi¬ 
gation”,  CAR-TR-100,  CS-TR-1458,  November  1084. 

2.  Shinji  Ozawa  and  Azriel  Rosenfeld,  “Synthesis  of  a 
road  image  as  seen  from  a  vehicle",  CAR-TR-111, 
CS-TR-1478,  March  1985. 

3.  Matti  Pietikainen  and  David  Harwood.  “Edge  infor¬ 
mation  in  color  images  based  on  histograms  of 
differences”,  CAR-TR-112,  CS-TR-1479,  March  1085. 

4.  Muralidhara  Subbareo  and  Allen  M.  VVaxman,  “On 
the  uniqueness  of  image  flow  solutions  for  planar  sur¬ 
faces  in  motion”,  CAR-TR-114,  CS-TR-1485,  April 
1985. 

5.  Allen  M.  Waxman  and  James  H.  Duncan,  “Binocular 
image  flows:  steps  toward  stereo-motion  fusion”, 
CAR-TR-119,  CS-TR-1494.  N.  »y  lfl€5. 

6.  SuLbarao  Kambhampati  and  Larry  ,S.  Oavis,  “Mul¬ 
tiresolution  path  planning  for  mobile  robots",  CAR- 
TR-127,  CS-TR-1507,  May  1985. 

7.  Kwangyoen  Wohn  and  Allen  M.  Waxman,  "Contour 
evolution,  neighborhood  deformation  and  Iccai  image 
flow:  curved  surfaces  in  motion",  CAR-TU-134,  CS- 
TR-1531,  July  1985. 


8.  Roger  D.  Eastman  and  Allen  M.  Waxman,  "Dispi-rity 
functionals  and  stereo  vision”,  CAR-TR-145,  CS- 
TR-1547,  August  1985. 

9.  Matti  Pietikaineu  and  David  Harwood,  "Mult’ple- 
camera  contour  stereo”,  CAR-TR-151,  CS-TR-1559, 
September  1685. 

10.  AmbjSrn  Naeve  and  Jan-O'of  Eklundh,  “On  projec¬ 
tive  geometry  and  the  recovery  of  3-D  structure”, 
CAR-TR-154,  CS-TR-1565,  October  1085. 

11.  Ken-ichi  Kanatani,  “Structure  from  motion  without 
correspondence:  general  principle”,  technical  report 
in  preparation. 


3 


A  **, 
V. 


Vis/ 

*  -  ■  V* 

Cm 

-\v-; 

tv-v- 


A 


6  Jg 
*•  _  ■  •  .  ** 
:'y/A 


o 


■7^7*  ;  j  ■  _  »  j  P  ;  «  »  J  J".1 «  t  ;i 


- »-  -■»..•>_ 


(TtCOiOiM^  62-:^ 


Image  Understanding  Research  at  CMU 


Takco  Kanads 
Steven  Shafer 


Abstract 


Compute'  Science  Department 
Carnegie- Mellon  University 
Pittsburgh  PA  15213 

1.  Shape  Understanding 


In  we  CMU  Image  Understanding  Program  we  have  been  working 
on  both  We  basic  issues  in  understanding  vision  processes  Wat 
deal  with  images  and  shapes,  and  :he  system  issues  m  developing 
*  demonstrable  vision  systems.  This  report  reviews  Our  progress 
str.ca  Ihe  October  1984  workshop  proceedings.  The  Uinhhghts  in 
our  Program  mctude: 

•  Victor  Milenkovic  has  developed  an  edge  based 
trinocuiar  (Wren  camera)  stereo  method  lor  computing 
depth  horn  images. 

•  Pick  Szeliski  has  extended  Ohta  and  Kanade's  dynamic 
programm-ng  stereo  method  If  use  a  coarse  to  line 
multi-resolution  search  strategy. 

•  Ellen  Walker  is  analysing  Ihe  ou/ect  independent 
geometric  reasoning  rules  in  Ihe  3U  Mosaic  system. 

•  Sieve  Shater  is  constructing  Ihe  Calibrated  Imaging 
Lab.  whicn  will  provide  high-precision  ■rr.ages  lor 
stereo,  motion,  shape  analysis,  and  photometric 
analysis. 

•  Martial  Hebert  has  developed  saveral  algorithms  lor 
analysis  o I  outdoor  range  images  10  erl.act  edges, 
planar  laces  ol  ob/eclt.  and  terrain  patches, 

•  Larry  Matthies  is  analysing  motion  stereo  image 
sequences  using  e  statistical  analysis  ol  uncertainty  io 
yield  high  accuracy. 

• Dave  McKeown  has  started  a  Digital  Mapping 
Laboratory  as  a  local  poi.d  lor  work  in  aerial  photc 
interpretation,  cartography,  and  computer  vision 
Current  protects  include  MAPS,  a  larg  scale 
,  image/map  database  system,  SPAM,  a  run  based 

system  lor  airport  scene  interoreialion,  and  ARP.  a 
system  lor  i.ndtng  and  tracking  roads  in  aerial  imagery. 

•  Jon  IVebo  is  developing  a  high-pertormance  vif’on 
system  on  a  systolic  machine.  Warp,  which  will  be 

,  actively  used  by  Ihe  vision  community  at  CMU.  The 

Warp  hardware  is  a  reality,  and  almost  a  dozen 
implementation  programs  are  now  running. 

eGudiun  Klmker  has  implemented  We  FIDO  mobi'e 
robot  vision  and  navigation  system  using  the  WARP. 

•  Chuck  Thorpe.  Richard  Wallace,  ar.d  Tony  ‘Stantz  are 
wcrk.ng  on  the  Straws Computing  Vis  on  protect, 
building  an  intelligent  mobile  robot  'or  outdoor 
operation 


1.1  Trinocuiar  Vision 

We  (Victor  Milenkovic)  are  stuck*  ing  trinocuiar  stereo,  a  variation 
o(  edge  based  s’ereo  using  three  cameras  instead  ol  two  [11). 
Three  cameras  are  arranged  men  equilateral  triangle  and  directed 
perpendicular  to  the  piano  ol  'he  triangie  Edges  are  extracted 
from  the  th.ee  images,  and  the  trinocuiar  algorithm  matches 
corresponding  edge  pixels  ismg  three  constraints.  The  first 
constraint  is  that  the  edge  pixel  positions  must  form  an  equilateral 
tnanglo  The  second  constraint  involves  the  edge  orientations 
which  must  satisfy  a  specific  relation  based  on  Ihe  perspective 
proiection  The  third  constraint  is  photometric:  we  expect  to  find 
match. rig  patterns  ol  intensity  values  near  the  edya  ptxei. 

Because  ol  rvror,  the  matching  alg  ..nthm  uses  statistical 
corpulence  measures  based  on  the  second  and  third  constraints 
instead  of  using  *h«  constraints  directly.  It  compares  the 
contidence  measures  0 1  competing  candidate  matches  m  ordei  to 
determine  wnich  one  «  more  likely  to  be  conect.  The  algorithm 
also  compares  the  confidence  measure  tp  a  statistically  based 
tailure  truest- ok}  in  order  to  detect  the  situation  where  there  is  no 
match  or  a  partial  match.  An  example  ol  e  partial  match  is  or.  edge 
arising  Iron,  an  occluding  conlour,  in  which  case  the  edge  position 
and  orientation  constraints  are  satisfied,  but  only  one  side  of  the 
edge  satisfies  the  photometric  conatramt.  A  filtering  method  is 
used  for  resolving  unorderable  match  candidates,  in  whicn  one 
candidate  has  higher  edge  orientation  confidence  and  ttio  other 
has  cotter  photometric  confidence. 

The  trinocuiar  stereo  algorithm  has  been  applied  to  both  real  and 
synthetic  images  It  works  very  .veil  on  the  synthehc  images,  and  it 
clearly  dem-nsfrates  ths  matching  of  occluding  contours.  The 
algorithm  works  well  on  real  images  even  though  the  camera  model 
is  somewhat  inaccurate.  Basically,  the  algorithm  works  as  well  as 
the  best  binocular  method  (actually  even  better  because  it  can 
match  horizontal  edges)  without  usng  any  sort  ot  continuity 
assumption  It  does  not  need  tc  assume  that  neighboring  edge 
pixels  have  similar  disparities  For  each  edge  pixel,  it  searches  the 
entire  range  of  possible  disparities,  and  it  does  not  make  any 
assumptions  about  the  order  in  which  the  matching  edge  pixels 
appear. 

A  possible  extension  of  this  work  is  to  do  trinocuiar  matching 
simultaneously  with  edge  tracing.  The  two  processes  could 
remlorce  each  other;  continuity  aiong  a  contour  would  provide  a 
means  of  eliminating  single  pixel  e-rors  ol  the  matching  scheme, 
while  depth  informahon  weald  help  the  tracing  take  the  correct 
“lnrk"  at  a  T  lunction.  Once  a  good  set  of  three  dimensional 
contours  were  generated,  they  could  be  reliable  “‘tended  past  the 
point  at  which  they  are  wsibfe  Irom  all  three  cameras.  Instead  of 


*i 


suffering  more  from  occlusion  than  two  camera  stereo,  the 
trinocular  algorithm  would  then  suffer  less,  because  more  points 
are  visible  from  two  out  o*  three  cameras  than  from  two  cut  of  two. 

1.2  Mufti- resolution  Stereo  Using  Dynamic  Programming 

We  (Rich  Szellski)  have  developed  a  multi-resolution  version  of 
the  dynamic  programming  stereo  algorithm  introduced  by  Ohtaand 
Kanade  [14j.  Their  technique  uses  both  intra-  and  inter-scanline 
search  to  obtain  a  disparity  map  starling  from  gray-scale  real  world 
images;  we  have  now  developed  a  faster  version  of  the  algorithm 
using  a  coarse- ;o-tine  multi- resolution  search  strategy. 

The  images  are  first  pre-processed  using  the  DOLP  Transform  to 
build  an  image  pyramid.  The  low-pass  (blurred)  muges  are  used  to 
calculate  the  cost  function  of  the  stereo  matcher,  while  the  band¬ 
pass  images  are  be  used  to  extract  the  edges.  The  stereo  matching 
algorithm  is  then  applied  to  the  coarsest  (smallest)  image.  The 
result  of  this  processing  (which  is  a  list  of  matched  edges)  is  then 
used  to  constrain  the  stereo  matcher  on  the  next  finer  (larger)  level. 
To  generate  these  constramts,  it  is  necessary  to  calculate  the 
correspondence  between  edges  at  various  resolution  level,  but  this 
is  relatively  easy  and  last. 

The  matching  proceeds  until  the  solution  for  the  finest  level  is 
obtained.  The  combined  processing  time  for  the  pyramid  is  much 
lower  then  for  single-resolution  processing,  since  the  constraints 
from  the  previous  level  greatly  reduce  the  search  space.  In  practice 
a  speedup  of  2.5  was  observed.  The  quality  of  the  results  for  the 
sing'?-  and  multi-resolution  versions  were  similar. 

1.;  Rule-Based  Geometric  Reasoning  (nr  Photo 
Interpretation 

We  (Ellen  Walker)  have  written  an  interactive  version  of  the  30 
Mosaic  system  and  are  using  it  to  study  the  existing  rules  in  the 
system  and  to  determine  new  rules  to  be  implemented.  We  hope  to 
develop  a  system  with  more  explicit  use  of  domain  knowledge 
which  will  be  more  robust,  as  well  as  capable  of  being  easily 
adapted  to  other  domains.  One  way  we  hope  to  do  this  is  to  add  a 
set  of  low  avel  image  processing  tools,  along  with  a  procedure  to 
determine  the  applicability  o.'  each  tool.  These  tools  would  be  used 
for  bottom-up  verification  of  hypotheses  developed  by  the  top-down 
component  ol  the  system.  For  example,  in  the  30  Mosaic  system, 
when  a  roof  is  found,  edges  are  hypothesized  from  each  vertex  of 
the  roof  to  the  ground.  We  3re  studying  operators  which  will  verify 
these  hypothesized  edges  to  determine  which  ones  to  use  under 
what  conditions.  The  system  is  being  tested  on  several  aerial 
images  of  Washington,  D.C. 

1.4  Thu  Calibrated  Imaging  Laboiatory 

We  , Sieve  Shaler)  are  building  a  Calibrated  Imaging  Laboratory 
(CL),  which  will  be  a  facil-ty  for  high-precision  imaging  with 
accurate  ground  truth  data  [(2j.  This  laboratory  will  help  to  bridge 
the  gap  between  computer  vision  theories,  which  trpically  depend 
on  unrealistic  assumptions  about  the  world,  ami  applications, 
which  must  function  on  real  images.  The  ClL  will  span  both  areas 
by  providing  real  images  in  a  controlled  environment,  with  the 
ability  to  incrementally  add  more  complexity  to  the  imaging 
situation  and  the  scene  In  all  cases,  accurate  ground  truth  data 
will  make  it  possible  to  quantitatively  evaluate  '.he  performance  of 
the  methods  used  for  image  analysis.  The  ClL  will  be  used  to  study 
stereo,  motion  analysis,  geometric  chape  recovery,  and 
photometric  and  color  analysis. 

The  facilities  of  the  ClL  include: 


•  Lighting  Control  provided  by  a  near-point  light  source 
'arc  lamp)  for  precision  shadow  analysis,  and  a 
complete  track  lighting  system  for  flexible  general 
illumination. 

•  Background  Reflection  Control  in  a  room  with  black 
ceiling,  black  carpet,  and  black  or  white  curtains,  with 
other  colored  backdrops  as  needed. 

•  High-Precision  Color  Images  proviced  by  a  custnm- 
built  camera  yielding  512x512x6  images  with  each  pixel 
value  being  repea-abte  (noise-free)  and  linearly  related 
to  scene  radiance,  using  color  filters  in  a  t iter  wheel. 

•  Precision  Ste-eo  end  Action  Image  Sets  provided  by  a 
mobile  platform  with  precision  X-Y  Z-pan  tilt  controls 
and  a  pair  of  CCO  cameras  aligned  for  stereo 
correspondence. 

•  Objects  including  simple  objects  tor  viewing  ana  a 
scale  model  landscape  that  presents  a  va-iety  of 
surface  property,  motioii,  and  occlusion  situations. 

•  Calibration  Data  provided  by  appropriate  tools, 
including  photometers,  precision  targets,  and 
calibration  camera  filters. 

•  Accurate  Ground  Truth  Data  provided  by  an  optical 
table  with  precision  position  control  devices  end 
surveyors1  transits  tor  position  measurement 

2.  £0  Data  Analysis 

2.1  Range  Data  Analysis  for  Outdoor  Imagery 
We  Martial  Hebert)  are  developing  algorithms  for  range  data 
anafys  s  for  outdoor  imagery  [2].  Cur  goal  is  to  develop  a  30  vision 
system  that  provides  a  description  of  an  unknown  environment  to  a 
mobile  robot.  This  description,  a  three-dimensional  map  of  tne 
observt  d  scene  in  which  regions  are  labeled  as  accessible  terrain, 
obiects.  etc  ,  will  provide  tne  necessary  information  for  path 
plannmr.  ana  landmark  recognition.  Wo  use  a  stale  of  Ihe  art 
sensing  Jevice.  Ihe  ERIM  scanner,  which  is  able  to  produce  64x256 
range  images  at  a  hame  rale  of  two  images  per  second  with  an 
accuracy  of  0.4  feel.  1  his  sensor  combines  a  large  field  of  view  (30 
degrees  horizontal  and  40  degrees  vertical)  and  a  fast  acquisition 
rate,  making  it  suitable  for  outdoor  imagery  analysis.  For  our 
experiments,  the  sensor  is  mounted  on  a  testbed  mobile  robot  so 
that  the  algorithms  are  tested  in  a  realistic  outdoor  navigation 
environment. 

We  have  developed  several  range  o.ila  segmentation  algorithms. 
The  first  goal  of  these  algorithms  is  to  extract  three  types  or 
features:  30  edges,  terrain  regions  divided  into  accessible  and 
non  accessible  ones,  and  obstacles  divided  into  pseudo  planar 
regions.  The  final  product  of  the  segmentation  is  a  graph  of  objects, 
regions,  and  edges.  The  segmentation  algorithms  proceeds  by  "'St 
extracting  low  level  attributes  such  as  eogn  points,  surface 
normals,  surface  curvature  Then,  each  attribute  is  used  lo  derive  a 
intermediate  segmentation.  Finally,  the  intermediate  segmentations 
are  merged  together  to  form  a  consistent  scene  description.  The 
complete  segmentation  takes  about  one  minute  on  a  vax-785.  This 
computation  time  will  be  later  reduced  by  using  the  warp  systolic 
array  processor.  The  segmentation  programs  have  been  used  to 
produce  input  to  a  path  planning  programs  for  a  mobile  robot.  Our 


* 


5 


resul's  show  that  range  data  analysis  provides  reliable  information 
fo.  outdoor  navigation  in  astatic  environment. 

The  techniques  described  so  far  proceed  by  independently 
processing  one  inage  at  a  time.  In  an  outdoo-  navigation  system, 
cor  vecutive  imaqes  are  related  to  each  other  to  develop  a  global 
man  That  is,  the  robot  would  grab  an  image  eve”/  1-10  meters  such 
,h„t  each  image  is  reg.stered  with  respect  to  the  previous  ones.  We 
have  developed  matching  techniques  for  registering  consecutive 
images.  Matching  proceeds  by  finding  the  best  match  between  the 
features  produced  by  the  segmentation  program;  this  matching  in 
turn  provides  an  estimate  of  the  3D  position  of  the  current  image 
wi*f.  respect  to  tho  current  global  map.  The  global  map  obtained  by 
Ine.ging  sequences  of  images  may  be  used  to  predict  the  aspect  of 
*  the  an  environment  already  'raversed.  We  have  tested  the 
matching  on  images  obtained  by  the  testbed  vehicle.  Results  of  the 
matching  technique  are  presented  in  these  proceedings. 

We  are  in  the  process  of  combining  range  data  with  other  sources 
jl  visual  information  such  as  color  images,  or  reflectance  data,  m 
order  to  obtain  a  more  accurate  environment  description. 


2.2  Motion  Stereo  Analysis 

We  (Larry  Matthies)  are  studying  the  use  of  probabilistic  error 
models  in  motion  solv  ng  from  stereo  Previous  applications  of 
stereo  to  motion  estimation  have  not  made  full  use  of  the  available 
knowledge  about  the  three-dimensional  uncertainty  cf  point 
features.  Our  simulations  suggest  that  reductions  in  the  estimation 
error  on  the  order  of  a  factor  of  ten  are  possible  when  this 
information  is  taken  into  account. 

The  basic  approach  to  motion  solving  from  stereo 
correspondences  involves  three  steps:  building  a  3-D  model  of  t  e 
scene  as  viewed  from  the  first  vehicle  location,  building  another 
model  from  the  second  vehicle  location,  and  finding  the  best  tit 
motion  that  maps  one  model  to  the  other.  By  modelling  uncertainty 
explicitly  in  the  first  two  steps,  we  can  obtain  better  results  for  tho 
last  step  of  estimating  motion.  Previous  efforts  to  mode  this 
uncertainty  have  associated  a  scalar  with  each  feature  point  that 
tells  the  reliability  of  its  3D  position  estimate;  these  are  then  utilized 
as  weights  in  a  least-squares  approach  to  motion  estimation  The 
reliability  weights  are  usually  simply  inversely  proportional  to  the 
distance  of  each  point  from  the  vehicle. 

We  can  do  better  than  this  by  modelling  the  feature  points  as 
random  variables  with  3-D  normal  distributions  The  means  and 
covariances  of  these  distributions  can  be  estimated  ,rom  th® 
imaqes  This  leads  to  a  a  revised  least  squares  formulation  in  which 
the  weights  are  now  3x3  matrices  formed  from  the  covariance 
matrices.  These  effectively  replace  the  usual  distance  metric  by  a 
new  one  that  gives  more  weight  to  errors  of  fit  in  directions  where 
•positional  uncertainty  is  low  ano  less  weight  in  directions  where 
uncertainty  is  high.  Typically  triangulation  leads  to  more 
uncertainty  along  the  line  of  sight  than  perpendicular  to  it.  Thus, 
when  we  transform  one  model  to  another,  errors  of  fit  are  weigh'ed 
less  along  the  line  of  sight  than  perpendicular  to  it.  This  approach 
to  a-ror  modelling  is  a  standard  method  of  photogrammetry,  which 
'we  have  now  successfully  applied  to  computer  vision. 


We  have  tested  this  algorithm  in  simulation  by  generating  random 
sets  of  3-D  points,  projecting  them  onto  image  planes,  adding  noise 
to  the  image  coordinates,  and  completing  the  triangulation  and 
motion  estimation  as  outlined  above.  The  system  of  matrix  weights 
produces  motion  estimates  whose  standard  deviations  are  less  than 
those  produced  by  the  scalar  method  by  a  factor  of  three  to  ten  or 
more,  depending  on  the  motion  parameter  and  the  distance  of  the 


points  trom  the  camera.  For  example,  over  a  simulated  motion  of 
1.(1  meter  straight  forward,  the  matrix  method  may  estimate  a 
motion  of  0.99  meters  whh  a  standard  deviation  of  0.01  meter,  while 
the  scalar  method  may  estimate  rougr.iy  the  same  motion  with  a 
standaid  dev.  hr'”  of  0.05  meters.  Details  of  the  approach  and  the 
results  will  appear  in  a  forthcoming  paper. 

Our  results  suggest  that  error  modelling  can  play  an  important 
part  in  improving  the  accuracy  of  visual  ranging  and  motion 
estimation.  We  plan  to  extend  this  work  in  several  urections.  The 
first  is  to  use  it  over  time,  as  the  vehicle  continues  to  move.  One 
approach  to  this  is  a  batch  least  squares  method  that  estimates  all 
point  and  vehicle  positions  at  once.  A  second  approach  s  an 
incremental  one  tnat  after  each  step  computes  only  the  new  vehicle 
position  and  filters  the  new  point  position  measurements  in  with  the 
cld.  Both  methods  are  likely  to  find  use  in  a  system  that  combines 
short-range  navigation  with  long  range  map  building.  Other 
extensions  will  be  to  detect  correspondence  errors  and  tc  mode! 
and  estimate  velocity  ar  well  as  position. 

3.  Aerial  Photo-Interpretation 

We  (Dave  McKeown)  have  started  a  Digital  Mapping  Laboratory 
as  a  focal  point  lor  work  in  aeriaJ  photo  interpretation,  cartography, 
and  computer  vision. 

3.1  MAPS:  Large-Scale  Image  Map  Database 
One  of  the  key  issues  in  building  systems  utilizing  emerging 
techniques  in  Al  for  applications  in  cartography  and  aerial  photo 
interpretation  is  the  generation  and  maintenance  of  a  domain 
knowledge  base.  Loosely  speaking,  this  "knowledge  base"  should 
contain  known  facts  and  spatial  relations  between  objects  in  an 
area  of  interest,  access  to  historical  or  normalcy  repons,  and 
methods  to  relate  earth  coordinates  to  pixel  locations  in  digital 
imagery.  Unfortunately,  these  spatial  database  capabilities  are 
somewhat  different  than  those  found  in  traditional  geographic 
information  systems.  0*hei  issues  include  methods  for  spatial 
knowledge  utilization  and  representation.  For  example,  simply 
having  access  to  cartographic  descriptions  does  not  really  address 
the  problem  of  how  to  operationalize  iconic  descriptions  for  image 
analysis  and  interpretation. 

maps  is  a  large-scale  image/map  database  system  for  the 
Washington  D.C.  area  that  contains  approximately  100  high 
resolution  aerial  images,  a  digital  terrain  database,  and  a  variety  of 
map  databases  from  the  Defense  Mapping  Agency  (DMA).  We  have 
continued  work  on  the  maps  image/map  database  system  primarily 
in  the  area  of  integra'ion  of  map  data  to  support  our  work  in  rule- 
based  airport  scene  analysis  [5, 6],  Much  of  the  geometic 
constraint  computation  in  the  SPAM  system  is  derived  from 
facil.ues  to  represent  image  features  in  geodetic  coordinates 
<latitude.longitude.elevation>.  With  partial.  support  from  Defense 
Mapping  Agency  we  have  adoed  approximately  SO  imag-.-s  to  our 
database  of  aerial  imagery  over  Washington  D.C.  and  ha  >e  begun 
performance  measurements  for  factual,  geometric,  and  mixed 
queries  on  an  expanded  Washington  D.C.  map  database. 

3.2  S°AM:  Rule-Based  Intepretation  of  Airport  Scenes 
We  have  continued  our  development  of  spam  a  System  for  Photo 
interpretaiion  of  Airports  using  maps  [7.  8,  9).  spam  is  a  ruiebased 
imagf -interpretation  system  that  coordinates  and  controls  image 
segmentation,  segmentation  analys.s,  and  the  construction  of  a 
scene  model.  This  work  uses  results  of  tho  maps  system  to  nrovide 
a  general  map  description  of  the  airport  layout,  and  tools  foi  r  latial 
reasoning  about  Size,  shape,  and  position  of  various  airport 


6 


.  V  .v.’VT'. 


(■pnaMPljPLl 


'pat ure?.  spam  has  been  run  on  several  images  o!  National  Airport 
n  Washington  D.C.. 

spam  provides  several  unique  capabilities  to  bring  map 

nowiedge  and  collateral  information  to  bear  during  all  phases  of 
the  interpretation.  These  capabilities  include: 

•  The  use  of  domain-dependent  spatial  constraints  to 
restrict  and  refine  hypothesis  formation  during 
analysis. 

•  The  use  of  explicit  camera  models  that  allow  for  the 
projection  of  map  information  onto  the  image. 

•  The  use  of  image-independent  metric  models  for 
shape,  size,  distance,  absolute  rsid  relative  position 
computation. 

•  The  use  of  multiple  image  cues  to  verify  ambiguous 
segmentations.  Stereo  pairs  or  overlapping  image 
sequences  can  be  used  to  extract  information  or  to 
detect  missing  components  of  the  model. 

With  partial  support  from  the  Defense  Mapping  Agency  we  have 
begun  to  explore  other  airport  configurations  within  the  spam 
system  architecture.  Candidate  airports  are  Moffett  Reid,  Dulles 
International,  and  Andrews  AFB.  We  need  to  validate  the  current 
spam  system  on  a  variety  of  airports.  In  the  process  of  this 
validation  we  will  explore  the  development  of  tools  to  aid  users  in 
defining  new  spatial  relationships  and  airport  scene  primitives.  Th 
goal  of  this  interactive  knowledge  acquisition  is  to  directly  genera* 
a  the  rule-based  component  of  the  spam  system  from  an 
intermediate  textual  representation. 

3.3  Stereo  Verification  For  Aerial  Image  Analysis 

As  part  of  the  spam  system,  we  have  produced  a  flexible  ste>.  • 
verification  system,  stereosys,  and  applied  it  to  the  analysis  of 
high  resolution  aerial  photography  [10],  Stereo  verification  refers 
to  the  verification  of  hypotheses  about  a  scene  by  stereo  analysis  of 
the  scene  Unlike  stereo  interpretation,  stereo  verification  requires 
only  coarse  indications  of  three-dimensional  structure.  In  the  case 
of  aerial  photography,  this  means  coarse  indications  of  the  heights 
of  objects  above  their  surroundings.  This  requirement,  together 
with  requirements  for  robustness  and  for  dense  height 
measurements,  shape  the  decision  about  the  stereo  system  to  use. 

Stereo  verification  is  used  when  an  image  region  has  twn 
candidate  object  labels  that  differ  in  the  height  of  the 
corresponding  objects.  Stereo  is  used  to  discriminate  between  the 
two  possibilities,  by  selecting  another  image  of  the  same  land  area 
from  our  image  database  and  performing  stereo  analysis  with  the 
image  under  interpretation  To  do  this,  we  must  address  stereo 
analysis  in  a  very  unconstrained  environment  Rather  than  simply 
focusing  on  isolated  image  analysis  where  stereo  pairs  are  carefully 
controlled,  we  have  constructed  a  system  that  can  automatically 
perform  matching  and  analysis  using  arbitrarily  selected  images. 
We  have  found  that  if  knowledge  based  image  u  iderstanding 
systems  are  to  begin  to  perform  analyses  tasks  at  a  level  of 
performance  required  for  mapping  and  photo  interpretation,  they 
must  be  able  to  accommodate  a  much  broader  '•ange  of  task 
uncertainty  and  complexity  than  has  been  previously  demonstrated 
in  any  research  or  development  system. 

Stereo  verification  deals  witi.  a  variety  of  problems  that  are  not 
ordinarily  present  in  isolated  experiments  with  stereo  matching  and 
analysis.  Some  of  these  are: 

•  An  appropriate  conjugate  u.,age  pair  must  be  selected 
from  a  database  of  overlapping  images  based  on 


criteria  that  would  maximize  the  likelihood  for  good 
correspondence. 

•  The  image  pairs  must  be  dynamically  resampled  such 
that  the  epipular  assumption  (ie.,  epipolars  are  scan 
lines)  used  in  most  region-based  stereo  matching 
algorithms  can  be  applied. 

•  Because  the  sire  of  the  aieas  to  be  matched  varies 
greatly,  the  system  design  mus  be  flexible  and  general. 

•  An  initial  coarse  registration  step  is  generally 
necessary  because  the  quality  of  the  correspondence 
between  conjugate  pairs  varies  greatly.  In  many  cases 
the  magnitude  of  tha  initial  mi'  eg:stration  is  greater 
than  the  expected  disparity  shift. 

•  The  system  must  ana'yze  the  stereo  results  and 
generate  a  symbolic  description  that  provides  an 
estimate  of  the  actual  height  of  ..ie  region  in  question, 
and  the  confidence  of  that  estim  ite.  The  computation 
of  a  depth  map  (aisparity  map)  is  not  a  sufficient  final 
result. 

The  results  of  this  research  indicate  that  image/map  database 
issues  in  stereo  verification  influence  the  utility  of  such  an 
approach  as  much  as  the  underlying  stereo  matching  algorithm.  In 
fact,  they  are  intimately  related.  The  ability  to  be  flexible  the 
selection  ot  stereo  pairs  provides  opportunities  for  multi-tcporaJ, 
multi-scale,  or  multi-viewpoint  matening.  Equally  as  important  is 
flexibility  in  the  matching  algorithm,  especially  with  respect  to 
assumptions  that  require  nearly  perfectly  aligned  conjugate  images, 
a  situation  that  is  unlikely  to  occur  in  outside  of  the  laborarory.  We 
believe  that  the  ability  to  dynamically  select  conjugate  image  pairs 
from  a  database  based  upon  the  region  of  interest  and  knowledge 
of  the  requirements  of  the  matching  algorithm  is  required  for  a  fully 
automated  aerial  photo-interpretation  system.  Our  results  also 
indicate  that  stereo  analysis  can  function  as  a  very  powerful 
discriminator  in  an  image  understanding  system  without  having  tu 
perform  3D  shape  reconstruction.  That  is,  coarse  estimates  of 
height,  coupled  with  confidence  in  those  estimates,  can  greatly 
constrain  search  during  image  interpretation. 

4.  Parallel  Architectures  for  Vision 

4.1  The  WARP  Programmable  Systolic  Array  Processor 

We  (Jon  Webb)  are  developing  a  high-performance  vision  system 
on  a  systolic  machine,  Warp,  which  will  be  actively  used  by  the 
vision  community  at  CMU  and  other  sites  (1,3,4, 19),  The  Warp 
hardware  is  a  reality,  and  almost  a  dozen  implementation  programs 
are  now  running. 

We  have  set  up  three  key  development  stages  (or  this  project: 

1.  Develop  a  critical  mass  of  basic  programs  so  that  users 
(vision  researchers!  won  t  need  to  write  routine  image 
processing  programs  before  they  attack  their  own 
problem. 

2.  Develop  a  Standard  Research  Environment  of  which 
the  Warp  vision  system  is  a  part.  The  4.2  Unix  version 
on  VME  bus  based  SUN  workstation  was  selected. 

3.  Present  demonstrations  in  the  context  of  the  Strategic 
Computing  Vision  project  at  CMU. 

Webb  and  Kanade  set  out  a  goal  of  implementing  80%  of  the 
SPIDER  Image  Processing  Packages  (approximately  200  routines). 
So  far,  the  following  programs  have  been  implemented  in  the  Wt 


7 


programming  language,  a  low-level  microprogramming  language: 

•  Fast  Ko.  ve'  rransicrm  ( I D  and  2D) 

•  Moravec’s  interest  ..peiatof 

•  Image  add,  subtract,  multiply,  inverse,  inverse  square 
root. 


•  hi-, tog  ram 


•  n  *  3  convolution. 


««  X)LP  (difterenoe  of  low  pass)  transform  to  generate 
image  pyramids. 

•  Image  pyramid  generation  (by  overlapping  *  x  4 
averages) 

•  Edge  preserving  filter  (routine  EGPR  in  Spider  library) 


•  Grayvalue  transition 


Al>  of  these  programs  run  in  a  standard  Unix  environment,  under 
Unix  4.2  BSD,  running  on  a  Sen  2/170.  This  Sun  has  been 
integrated  into  our  environment  at  CMU. 


Some  of  the  above  programs  have  been  used  to  dnve  the 
erregator.  an  autonomous  land  vehicle,  at  speeds  of  up  to  one 
Kilometer  per  hour,  which  is  much  faster  than  .t  was  capable  of 
before  Warp  was  used.  In  fact,  the  current  speed  limitation  in  me 
Terregator  is  not  the  computing,  but  the  engine  power:  me 
Terregator  cannot  be  d.iven  faster  than  it  is  currently  with  me  range 
of  payloads  it  must  support. 


The  W2  la..guage  recently  became  capable  of  compiling  useful 
.mage  processing  programs.  We  have  run  demonstrations  of  me 
erreoa‘0'  using  entirely  W2  generated  ode.  Some  W2  programs 


that  ha1  e  'jeen  written  mcinde: 


•  Arbitrary  size  convolution 


•  Median  Filter 

•  Color  segmentation  oi  an  image  into  different  features 


interest  operator.  The  robot  currently  proceeds  at  a  speed  of  1 
meter  per  30  seconds. 

The  three  vision  routines  have  been  implemented  on  Warp  as 
three  microprograms.  Each  routine  uses  two  Warp  ceils.  When  they 
are  run  at  full  speed  on  two  cells,  speed  improvements  by  factors  of 
3.5  (data  reduction),  6  (interest  operator),  and  14  (feature 
correlation),  compared  to  the  sequential  version  cn  me  VAX,  can  by 
expected.  (In  the  fin-1  Warp  system  with  10  cells,  the  factors  will  be 
14,  30,  and  84).  Howevei,  these  speedups  are  not  achieved 
currently,  since  me  host  system  mat  provides  Warp  with  data  has 
not  yet  been  developed  completely. 

Bdo  has  been  transferred  from  the  Vax  to  me  Warp  host.  The 
vision  routines  that  use  me  Warp  cells  are  running  as  independent 
programs.  They  are  currently  included  into  Fido.  As  a  next  step, 
Bdo  will  be  adapted  to  run  on  a  faster  host  system,  consisting  of 
two  dedicated  processors  for  me  communication  with  Warp  and 
one  master  processor  mat  performs  me  less  time  consuming  tasks 
of  Bdo  and  supervises  the  other  processors. 

5.  Strategic  Computing  Vision 

Our  goal  for  me  Strategic  Computing  Vision  project  at  CMU  is  to 
build  an  intelligent  mobile  robot  capable  of  operating  in  the  ret* 
world  outdoors.  We  are  attacking  mis  on  a  number  of  fronts, 
ranging  from  building  appropriate  research  vehicles  to  exploiting 
high-speed  experimental  computers  to  building  software  for 
reasoning  about  the  perceived  world  [13, 15, 16, 17).  This  work  Is 
summarized  here  because  much  of  it  is  related  to  our  image 
understanding  research.  The  highlights  of  our  progress  include: 

•  Runs  with  our  vehicle  continuously  moving  along  paths 
and  sidewalks,  using  a  television  camera  to  sense  the 
pavement  We  have  used  several  different  Image 
processing  techniques,  Including  color-based  region 
classification  and  oriented  edge  tracking. 

•  Sonar-based  runs  cross-country  among  trees  and 
obstacles,  and  at  me  bottom  of  a  coal  mine.  Our 
sensors  are  me  inexpensive  Polar-yid  sonars.  We  use 
me  overlap  between  me  fields  ot  view  neighboring 
sonars,  and  between  the  field  i.<  new  of  me  same  sonar 
from  different  vehicle  por.  ions,  to  reason  about 
probably  empty  or  probabi,  occupied  regions.  This 
gives  us  maps  with  a  half  foot  resolution. 


•  Histogram 

•  Table  lookup  0 

•  Sobel  operator 

•  Color  normalization 

In  the  future,  all  Warp  prog..:  iming  will  be  done  in  W2.  We  expect 
this  to  change  the  Warp  frr- .  a  machine  which  is  fast  but  difficult  to 
use  into  a  genuine  research  tool. 

4.2  Application  of  the  WA  ‘P  For  Mobile  Robot  Navigation 
In  order  to  show  how  Warp  can  be  integrated  into  complex  vision 
systems  we  (Gudrun  Klinl-er)  are  'Jsing  Warp  to  perform  the  basic 
vision  routines  of  the  Fido  system  that  was  built  at  CMU  and 
Stanford  [18].  Fido  is  a  mobile  robot  system  that  uses  stereo  vision 
to  navigate  through  a  room,  avoiding  obstacles.  When  running  on  a 
single  processor  (VAX  11/78.1),  it  spends  about  80%  of  its  time  on 
three  vision  routines:  data  reduction,  feature  correlation,  and  me 


•  Runs  through  me  same  trees  using  me  ERIM  laser 
scanner,  using  algorithms  described  in  th  *s 
proceedings. 

•  Successful  rims  using  stereo  vision  to  sense  and  avoid 
obstacles.  The  r,oo  vision  and  navigation  system  waa 
originally  built  for  small  indoor  vehicles.  We  have 
pulled  out  various  separate  modules,  including  path 
planning  and  stereo  vision  lor  obstacle  detection. 

•  The  first  real  application  of  the  prototype  Warp  systolic 
processor.  We  have  made  several  vision-based  runs  to 
demonstrate  the  prototype  Warp.  These  runs  showed 
first  that  vision  programs  were  easy  to  put  together  on 
the  Warp,  and  second  the  potential  for  high  speed  n:  is 
when  the  full  Warp  arrives. 

•  Design  ol  Navlab,  a  robot  van  Some  of  the  most 
interesting  problems  are  in  fusing  data  from  multiple 


f 


Y 


V' 

V* 

O 


8 


J 

i 

i 


seniors.  We  have  designed  a  motxle  robot,  to  be  built 
in  a  converted  van.  'hat  will  have  the  power  and  size 
needed  for  these  experiments.  The  design  has  room  to 
carry  stereo  cameras,  an  erim  scanner,  sonars,  several 
processors  including  a  Warp,  and  includes  room  for 
four  researchers. 

•  Design  and  first  stages  of  implementation  of  a  software 
blackboard  system  for  connecting  the  output  or  all  the 
sensing  and  reasoning  programs  into  a  single  view  o I 
the  world. 

6.  Bibliography 

1 .  Gross.  T„  Kung,  H.T.,  Lam.  M.  and  Webb.  J.  Warp  as  a  Macrane 
for  Low-level  Vision.  Proceedings  of  1965  IEEE  International 
Conference  on  Robotics  and  Automation,  March,  1985,  pp. 

790  800. 

2.  Hebert.  M.  and  Kanade.  T.  First  Results  on  Outdoor  Scan* 
Analysis  Using  Range  Data.  Proc.  DARPA  Image  Understanding 
Workshop.  Miami  Beach,  Florida.  December.  1985. 

3.  Kung.  H.T  and  Webb.  J.  A  Global  Operations  on  the  CMU  Warp 
Machine.  Proceedings  of  1985  AIAA  Computers  in  Aerospace  V 
Conference.  American  Institute  d  Aeronautics  and  Astronautics, 
October,  1985. 

4.  Kung,  H.T.  and  Webb.  J. A.  Global  Operations  on  a  SystoUc 
Array  Machine.  Proceedings  of  IEEE  international  Conference  on 
Computer  Design,  IEEE,  1985,  pp.  165-171. 

5.  McKeown,  DM.  Digital  Cartography  and  Pfioto  Interpretation 
from  a  Database  Viewpoint.  In  New  Applications  of  Datapaths, 
Gargarin.  G.  and  Goiembe.  E„  Eo. .Academic  Press,  New  Yoik, 

N.  Y  ,  1984,  pp.  1»42. 

6.  :  IcKeown,  D.  M.  'Knowledge-Based  Aeria  °tioto 
Interpretation."  Photogrammetna.  Journal  ol  I.  s  International 
Society  for  Phologramrretry  and  Remote  Sensing  39  (1984), 

91-123.  Special  issue  on  Pattern  Recognition 

7.  McKeown,  D  M.,  Denlmger,  JL  Map- Guided  Feature  Extraction 
from  Aerial  Imagery.  Proceedings  of  Second  IEEE  Computar 
Society  Workshop  on  Computer  Vision:  Representation  and 
Control.  Annapolis.  Maryland.  May.  1984.  Also  available  at 
Technical  Report  CM'J  CS  84-1 17 

8.  McKeown.  D  M  and  Pane.  J  F  Alignment  and  Connection  ol 
Fragmented  Linear  Features  m  Aerial  Imjgery  Proceedings  IEEE 
Computer  Vision  and  Pattern  Recognition  Conference  San 
Francisco.  California.  June.  1955  Also  available  as  Technical 
Repon  CMU  CS  85  122 

9.  McKeown.  D  M  .  Harvey.  W. A.  and  McDermott.  J.  'Rule  Based 
Interpretation  ol  Aerial  Imagery  ”  l££B  Transactions  on  Pattern 
Analysis  and  Ktactune  Intelligence  PAMI  7,  5  (September  1985), 

570  585. 

10.  McKeown.  D  M.,  McVay.  C.  A  .  and  Lucas.  8  D  Stereo 
Verification  in  Aerial  Image  Analysis.  Proc  DARPA  Image 
Understanding  Workshop.  Miami  Beach,  Florida.  December.  1985. 

11.  Milenkovic.  V  J.  and  Kanade.  T  Trmocular  Vision  Using 
Photometric  and  Edge  Orientation  Constraints  Proc  DARPA 
Image  Understating  Workshop.  Miami  Beach.  Florida. 

December.  1985. 

12.  Shafer.  S  A.  The  Calibrated  Imaging  Lab  Under  Construction 
at  CMU.  Proc  DARPA  Image  Understanding  Workshop.  Miami 
Beach.  Florida.  December.  K-as. 

13.  Anthony  Slenlz  and  Chuck  Thorpe  An  Architecture  lor 
Autonomous  Vehicle  Navigation.  Proceedings  ol  the  Fourth 
International  Symposium  on  Unmanned  Untelhered  Submersible 
Technology.  University  of  New  Hampshire  Marine  Systems 
Engineering  Lab.  June.  1985 


14.  Szehski,  R  Multi-resolution  Stereo  Using  Dynamic 
Programming  Internal  report  of  the  I  US  group  at  CMU 

15.  Thorpe,  C..  Stentz,  A  .  and  Shafer,  S.  An  Architecture  for 
Autonomous  Vehicle  Favigabon.  Proc  Computers  in  Aerospace  V 
Conference.  IEEE.  Lc  ng  Beach,  California.  October,  1985.  pp. 

22  27. 

15.  Wallace.  R  .  A.  Stentz.  C.  Thorpe,  H.  Moravec,  W.  Whittaker 
andT.  K»~ade.  First  Results  n  Robot  Road-Following.  UCAI-65. 
August.  1965. 

17.  Wallace.  R„  Matsuzaki.  K.,  Goto.  Y ,  Webb,  J„  Cnsman,  J.  and 
Kanade  T  Progress  in  Robot  Road  Following,  submitted  to  1986 
IEEE  Robotics  and  Automation  Conference 

18.  Webb.  J..  KJmker.  G  ,  and  Kanade,  T.  PmraM  Vision 
Algorithms  on  a  Systolic  Array  Machine:  WARP.  CMU  CSO,  1986. 
In  preparation. 

18.  Webb.  J.  and  Kanade,  T.  Vision  on  a  Systolic  Array  Machine, 
to  appear  in  Computing  Structures  tor  /man*  Processing,  edited  by 
M.J.B.  Dull.  S  Levialdi.  K.  Preston  and  L  Uhr,  Academic  Press, 


9 


3 

• » 

•a 


£ 


* 


i 


f.' 

'■V 

wV 


Kr 


fe 


*;•. 

K 

r;- 

i*/- 

r. 

t 

nr^ 

}.♦ 

t  .* 


IMAGE  UNDERSTANDING 
RESEARCH  AT  USC:  1984-85’ 

R.  Nevatia 

Intelligent  Systems  Group 
Departments  ol  Electrical  Engineering 
and  Computer  Science 
University  of  Southern  California 
Powell  Hail  Room  234 
Lot  Angelea.  CA  90083-0273 


V  INTRODUCTION 

Our  research  .n  this  period  has  focused  on  the 
following  topics 

1  Description  of  3-D  surfaces  and  obiects 

2.  Stereo  analysis  and  surface  interpolation  -  for 
aerial  image  analysis  and  for  indoor  scenes 

3.  Mapping  from  Aerial  Images 

4  Motion  analysis 

5  Parallel  processing  of  IU  algorithms 

8  Conversion  of  software  to  VAX-Unix/5»ymbolics 
envii  on  ments 

Results  of  these  activities  are  summarized  below 
details  of  1.  2  and  4  above  may  be  found  in  other  papers 
ir.  these  proceedings  M.  2.  3.  4l 

2.  3-0  SURFACES  AND  OBJECTS 

We  are  following  both  surface  and  volume  description 
methods,  volume  description  r-zve  many  advantages,  but 
are  not  suitable  for  certain  obiects  that  consist  of  mostly 
min  surfaces.  Also  the  surface  descriptions  may  aid  in 
computing  voiume  descriptions 

2.1.  Surface  Descriptions 

Our  surface  description  technique  assumes  that  dense 
renge  (3-D)  data  is  available  for  the  visible  surface  From 
this"  data,  we  wish  to  detect  the  lines  and  points  m  the 
surface  that  correspond  to  important  physical 
the  surf.ee  (e  g.  «"d  -»okl.T  W.  propose  that  such 

descriptions  can  be  computed  from  the  curve  u  e 
pr: parties  of  a  surface  l.i  particular,  we  •**•«  ,h* 

z.,o  crossings  and  the  extrema  of  the  curvature  are  of 

particular  importance  While  <t  '*  »  »how  ,h* 

correspondence  between  curvature  properties  and  physical 
properties  m  principle,  computation  of  curvature  requires 


't**  immx*  —  aiTnaJIch 

a  A  t f  v  ixrf  is||  m O h itor«a  Off  ia 

COMfKl  n  3615  M 


estimating  second  derivatives  and  hence  is  prone  to  noise 
or  small  surface  variations.  To  handle  these  variations,  we 
smooth  the  surface  by  Gaussian  masks  of  varying  variance 
end  combine  the  results  of  different  size  opera  iors  Our 
method  has  been  tested  or.  several  synthetic  and  real 
images  and  is  described  in  detail  in  (1l 

22.  Otofect  Description 

Our  obiect  description  technique  assumes  that  only 
sparse  3-D  data  of  an  obiect  is  available  such  as  at  obiect 
boundaries  (in  contrast  to  the  dense  data  needed  tor 
surface  descriptions)  Even  these  boundaries  may  be 
incomplete  and  fragmented,  as  <s  typical  of  3J>  <UU 
derived  from  stereo,  for  exampie  in  addition.  **•*•*• 
may  contain  surface  markings  and  segments 
no.se  Our  ob*ctfve  is  to  «nd  generalized  cone 
descriptions  of  the  obiects  in  the  scene  voder  »h*“ 
conditions.  Our  approach  is  to  search  for  *•** 
pounder,  segments  that  satisfy  certain '•"**"* 
mathematical  properies  expected  of  ®*"#r*“**d 
The  current  method  is  applicable  only  to  MM  of 
-linear,  straight,  homogeneous  generalize  cones  .  but  we 
believe  that  it  will  extend  to  much  more  broader  classes 
For  this  simple  case,  the  'contour  generators'  of  the 
generalized  cona  are  known  to  be  coplaner  and  we 
assume  the  'terminator*  to  satisfy  certain  properties  also. 
Ou-  method  and  results  on  a  few  examples  are  given  in 

(21 

3.  STEREO  ANALYSIS 

Stereo  anaiysia  has  been  a  topic  of  continued  imprest 
.n  image  understanding.  Earlier,  we  have  presented  a  line 
matching  stereo  algorithm  (51  with  .merest.ng  results.  We 
have  initiated  an  effort  to  evaluate  m  ‘MP’h  **» 
performance  of  various  stereo  algorithms,  in  particular 
those  that  do  the  correspondence  matching  on  a  llne-by- 
llne  basis.  in  addition  to  providing  a  method  for 
evaluation,  our  approach  can  also  correci  certain  types  of 
errors,  as  we  know  that  disparities  along  a  linear  segment 
m  the  .mage  must  vary  linearly.  This  approach  is 
described  m  [31 

in  another  affort.  wa  have  been  studying  the  problem 
of  surface  interpolation  from  the  stereo  data,  as  feature- 
basad  stereo  gives  depth  information  at  intensity  edges 
only  We  have  implemented  a  mutli-resolution  surface 


•2 


L  ■ 

£ 

ft. 


t 


t 


.  r 

'.'■f 


'  4 
•~4 


10 


I 

i 

1 

t 

i 


■ 


r 


reconstruction  algorithm  developed  by  Terzopoulos  at  MIT 
[6.  71  Our  implementation  differs  in  detail,  and  we  have 
tested  the  algorithm  on  a  w.der  variety  of  cases  than 
described  in  Terzopoulos'  original  work.  This  algorithm 
seems  to  work  well  when  the  stereo  data  is  fairly  dense, 
but  the  reconstruction  quality  falls  when  the  input 
becomes  sparse.  Our  experiments  will  be  described  in  a 
USC-ISG  report  under  preparation 

4.  PARALLEL  PROCESSING  OF  IU  ALGORITHMS 

The  computational  needs  of  IU  algorithms  are 
enormous  and  parallel  processing  is  a  promising  way  to 
speed  up  their  performance  Parallel  processing  of  the 
'low-level*  IU  algorithms  (such  as  convolution  and  edge 
detection)  is  rather  straight-forward  but  higher  level 
algorithms  do  not  have  such  an  obvious  mapping  to  a 
parallel  machine  In  earlier  work  [8L  Moldovan  has 
suggested  a  cellular  architecture,  called  SNAP,  that 
consists  of  an  array  of  grid-connected  processing  cells 
w’th  specialized  communications  units.  In  recent  work,  a 
simulator  has  baen  written  for  SNAP  and  we  have 
examined  the  problems  of  transitioning  from  iconic 
representations  to  purely  symbolic  representations  in 
parallel  architectures.  A  task  of  object  recognition 
assuming  high-level  descriptions  has  been  analyzed. 
Results  of  these  studies  will  be  available  in  a  USC-ISG 
report  under  preparation  at  this  time. 

5.  MOTION  ANALYSIS 

Estimation  of  3-0  motion  parameters  and  obiect  shape 
from  2-0  motion  correspondences  hes  been  an  area  of 
active  research  in  the  IU  community  lately  f  is  possible 
to  formulate  this  problem  mathematically  in  fairly  straight¬ 
forward  ways  and  to  solve  the  resulting  equations  (both 
linear  and  non-linear  formulations  have  been  proposed). 
Unfortunately,  this  problem  turns  out  to  be  'ill- 
conditioned'  in  the  sense  that  small  changes  in  input 
parameters  cause  large  changes  in  the  computed 
descriptions.  We  have  studied  two  alternatives  to  alleviate 
this  problem 

In  one  approach,  we  use  the  method  of  'regularization', 
popularized  by  Poggio  191.  In  this  method,  a  term 
corresponding  to  the  smoothness  of  the  solution  is  added 
to  the  error  function  given  by  the  motion  equations  to 
givq  a  more  stable  solution.  Details  of  this  method  are 
given  in  another  paper  in  these  proceed  ngz  (4). 

In  another  approach,  we  are  studying  the  possible  use 
of  'acceleration'  parameters,  obtr.med  by  using  3  or  more 
frames  of  motion  in  addition  to  'velocity'  parameters 
(obtained  from  2  frames)  used  conventionally.  Some 
preliminary  analytical  results  are  descnoed  in  [10). 

Motion  analysis  research  has  now  been  transitioned  to 
our  strategic  computing  research  program  because  of  its 
obvious  relevance  to  the  Autonomous  Land  Vehicle  (ALV) 
program 


6.  MAPPING  FROM  AERIAL  IMAGES 

Automating  map-making  from  aerial  images  is  a 
central  tssk  in  our  research  program.  In  the  past,  we  have 
developed  methods  of  linear  feature  extraction,  stereo 
mapping  and  image  to  map  correspondence.  We  are  now 
attempting  to  develop  a  system  to  handle  complex 
mapping  tasks:  we  have  initially  selected  large  commercial 
airports  as  our  task  domain  This  domain  has  a  variety  of 
obiects  such  as  long,  linear  features  (runways,  taxiways. 
roads),  a  variety  of  buildings  (hangars,  terminals,  etc.)  and 
a  variety  of  mobile  objects  (cars,  trucks,  airplanes). 
Further,  the  airport  complexes  are  under  continual 
changes,  usually  due  to  expansion.  The  mapping  of  this 
domain,  thus,  offers  a  variety  of  challenging  problems. 

Our  concentration,  so  far.  has  been  in  the  mapping  of 
runways  and  ta:uways  (we  are  pursuing  mapping  of 
buildings  under  separate  funding  from  the  Defense 
Mapping  Agency).  The  unways  and  taxiways  may  appear 
to  be  modelled  easily  -  namely  as  long,  thin,  rectangular 
strips  of  uniform  brightness.  Unfortunately,  the  reel 
images  are  much  more  complex.  Runways  have  tire  and 
tread  marks  and  oil  spots,  usually  near  the  center  and 
surface  markings  along  the  sides,  at  the  two  ends,  and 
sometimes  also  in  the  middle.  Also,  in  certain 
geographical  locationr,  the  runways  surfaces  develop 
defects  that  need  to  be  patched:  this  patching  is  not 
necessarily  homogeneous  with  tne  original  surface 
material. 

In  the  following  we  show  steps  of  our  analysis  on  an 
image  of  the  Boston  Logan  airport.  The  steps  are  similar 
to  those  shown  for  the  Los  Angeles  International  Airport 
image  in  our  previous  report  [ML  These  experiments  are 
part  of  our  study  to  determine  the  generality  of  our 
methods  and  assumptions  and  to  develop  a  better 
'knowledge-base'  of  this  task  domain. 

Figure  1(a)  shows  a  800x2200  pixel  portion  of  the 
image  of  the  Logan  Airport  in  Boston  We  first  extract  line 
segments  and  antiparallels  (a  pair  of  parallel  line  segments 
having  opposing  contrast)  using  our  'LINEAR'  software 
The  8.862  segments  computed  are  shown  in  figure  1(b). 
and  the  2.212  antiparallels  (apars)  are  shown  in  figure  1(c). 
The  edges  were  threshoided  on  magnitude  before  line 
fitting,  and  the  minimum  and  maximum  separation 
between  apar  segments  was  1  and  56  pixels  respectively 

In  the  second  step,  we  obtain  an  estimate  of  the 
direction  of  the  runways  ana  their  width.  Figure  2  shows  a 
length-weighted  histogram  of  the  angles  of  the  segments 
in  figure  1(b)  The  histogram  clearly  shows  the  two 
dominant  orientations,  which  presumably  correspond  to 
the  direction  of  the  runways  in  the  image  We  perform  our 
processing  for  each  orientation  separately  and  then  we 
combine  the  results 


f..** 

rS* 


i* 

V  -* 


11 


(b);  Segments  detected  in  Fig.  1(e) 


a 


(c):  Anti-psrellel  lines  (displayed  by  medial 
a«ss  in  Fig  1(b)) 

Figure  1 


12 


i 

i 

i : 
i 


i 


i 


! 

fa 

» 

I 


> 

f 

i 

E 

J; 

r 


i 

• 

i 

i 

i 

i 

■ 

i  i 

•  i 

•  n 

«  n 

i«  .  .in 

!l..  ...1.XI1S.....  a . ItIttt . ....I . . 

0-2. 30.  32  to-  mi  to-  K  120-122  150-152  tie-tea 

iicn 

Figure  2:  Langth-waightad  orientation  histogram  ot 
the  segments  in  tigura  1(b) 

Figure  3  show*  a  length-weighted  width  histogram  ot 
tha  jpars  in  figure  1(c)  that  ara  oriented  parallel  to  the 
angla  astimaias  of  runway  orientation  Tha  positive 
numbers  on  tha  right  half  of  the  histogram  correspond  to 
bright  spars  widths  (tha  apar  surounds  a  region  brighter 
than  its  background).  Tha  negative  numbers  correspond  to 
dark  apar  widths.  Note  tha  group  of  bright  a  pars  having  a 
width  of  about  4  pixels.  These  correspond  mostly  to  tha 
whita  line  markings  which  bound  tha  landing  surfaces  Tha 
bright  and  dark  apars  about  20  pixels  wide  correspond  to 
taxiways.  runway  shoulders  and  other  service  roads 
parallel  to  the  runways.  Tha  dark  and  bright  apars  around 
tha  38  oaak  width  value  include  tha  group  of  apars 
corresponding  to  tha  landing  surfaces  bounded  by  the 
whit*  line  markings.  Tha  bright  apars  coma  from  tha 
outside  edges  of  the  white  markings  and  tha  dark  apars 
come  from  the  inside  edges  of  the  whit*  Una  markings. 
We  select  by  hand  an  estimate  of  the  obiects  width  and 
process  each  group  separately 


r. 


I 

a 

i 

I 

I 

X 

X 

X 

I 

I 

X 

X 

X 

1 

I 

X 

X 

El 

eI 

I 

X 

X 

XX 

a 

I 

X 

X 

EX 

a 

I 

X 

I 

n 

a 

a 

XX 

X 

n 

□ 

a 

XX 

X 

XX 

n 

XX  M 

u 

a  f 

n 

n 

XX  x 

XX 

X  1 

XX 

n 

XX  X 

XX 

Xml  X 

n 

u 

XX  X 

a 

xn  x 

□ 

n 

XX  I 

X  XX 

XXX  1 

I  XX 

a  n 

a 

'mini  11 

XXXX  X  a 

X  n 

-TT~M 

-25-2* 

<3»-  -2 

ao 

« 

*  I  a  t  a 


Figure  1:  Length-weighted  width  histogram  of  a  subset  of 
the  apars  shown  in  figure  1(c) 

In  the  next  step  we  extract  «pr-s  that  are  likely  to 
represent  portions  of  runways.  Figure  4(a)  shows  the  apart 
(as  ribbons)  extracted  form  those  in  figure  1(c)  which  are 
oriented  in  the  estimated  direction  of  the  runways,  and 
having  a  width  between  30  and  45  pixels  and  an  aspect 
minimum  length-to-width  aspect  ratio  of  2:1.  The 
remaining  steps  attempt  to  join  these  portions  to  form 
longar  portions. 


The  first  criteria  we  use  to  join  apars  is  based  on 
segment  continuity,  if  a  given  segment  is  part  of  two  or 
more  apars.  and  the  other  segments  In  the  apars  are 
collinear.  which  implies  that  the  apars  have  me  same 
width  and  color,  then  the  apars  are  joined.  Figure  4(b) 
shows  the  resulting  ribbons  after  this  criteria  is  applied  to 
the  apars  shown  in  figure  4(a).  Not*  that  we  obtain  some 
apars  contained  in  wider  and/or  longer  apars.  In  this 
examp'e.  the  wider  apery  are  brigh4  spars  fc-med  by 
segments  in  the  outside  edges  of  ti>*  tvhue  'harking*, 
while  the  narrower  ones  are  dark  apt  r  T*>nr'c»i  by 
saoments  in  tha  inside  edges  ot  i'*f  m  i  te  rwkjr.g*.  In 
other  cases,  the  narrower  apars  sito  ecr-'-po ..;J  to 
segments  in  the  boundaries  ot  *ioi. gated  region*  on  me 
landing  surface,  such  as  tire  trvso  marks,  in  this  step  we 
eliminate  those  apars  which  ere  entirely  contained  in 
longer  and/nr  wirier  apars  The  result  of  this  step  la  shewn 
in  figura  4(c). 

in  tha  next  step  we  examine  each  one  cf  the  remaining 
apars  and  determine  whether  they  are  good  runway 
candidates  for  further  attempts  to  join  them.  if  we 
assume  that  most  of  tha  line  segments  contained  inside  a 
ribbon  must  be  Oriented  m  the  direction  of  the  runway, 
such  as  those  r nor* tenting  intermittent  white  martings  in 
the  center  of  the  lending  surface,  at  well  as  other 
markings  which  tend  to  be  oriented  in  the  direction  9f  the 
runway,  then  we  can  determine  which  apars  can  be 
discarded  by  examining  the  line  segment*  in  a  window 
whose  six*  and  position  is  that  of  the  ribbon.  Currently 
we  require  that  the  length-weighted  orientation  histogram 
of  the  tegmenta  m  the  ribbon  show  that  90%  of  the 
segments  be  on  anted  parallel  to  the  estimated  runway 
direction.  Further  refinements  arc  needed  at  this  step  to 
account  for  common  occurrences  of  changes  in  runway 
surface  material*  due  to  runway  extensions,  'patches'  of 
surface  repair  having  random  shapes,  intersections,  runway 
heading  numbers,  aircraft  and  so  on.  For  this  example,  no 
apars  ware  discarded  at  this  step. 

The  last  step  tn  our  example  is  similar  to  the  one  just 
described,  except  that  we  examine  the  segment*  in  tha 
gap  between  two  collinear  n burns  having  the  seme  width. 
If  the  length-weighted  histogram  of  tn*  segments  in  the 
gap  show*  that  90%  of  them  are  oriented  in  tha  direction 
of  the  runway,  w*  join  the  two  ribbons.  Figure  4(d)  shows 
the  final  raault  after  the  short  ribbons  have  been 
discarded.  W*  not*  that  this  result  provide  us  with  a  good 
estimate  of  the  location  o«  tha  runways  in  tha  scan*,  and 
is  suitable  for  applying  further  detai'ed  analysis  leading  to 
;h*  description  of  these  objects. 


h*T» 

I-4'.** 

y.' 

,\'.s 


'.'.N 


•'.I 

v’.y 

v*.% 

■V 


i  -•  t  ft  ■  i  r  ■  ~  -i* 


7  SOFTWARE  COFT  ERSION 

Last,  but  not  laast.  w«  hava  devoted  a  considarabla 
amount  of  affort  m  the  last  veer  to  convarting  our 
softwara.  Until  last  yaar  our  computing  angina  was  a  DEC 
POP-10  and  our  softwara  was  wnttan  primarily  in  SAIL 
with  soma  small  parts  bamg  in  LISP  (UCI-USP).  Sinca  tha 
beginning  of  198S.  wa  hava  had  accass  to  soma  OEC.  VAX 
11/750  systams:  soma  Symbolic  3640  systams  hava  baan 
availabla  sinca  summar  of  1985.  Our  currant  plan  is  to 
usa  only  thasa  naw  systams  as  soon  as  is  practical.  Ou' 
plans  ara  to  davalop  most  of  tha  softwara  in  USP.  with 
soma  of  tha  lowar  laval  programs  bamg  in  C  (lack  of  an 
efficient  USP  for  tha  VAXas  ramains  a  major  problem)  Wa 
ara  also  attampting  to  maintain  compatibility  with  *SRI  IU 
Taatbcd*  standards  (on  tha  VAX)  and  SRI  imagaCalc 
standards  on  tha  Symbolics  systams. 

As  our  softwara  un  tha  POP- 10  was  davalopad  ovar  a 
panod  of  many  yaars.  tha  convarsion  affort  raquirad  is 
rathar  larga  (and  not  entirtly  pamlass).  Nonetheless.  wa 
hava  mad#  significant  prograss.  Our  linaar  faatura 
axtraction  systam  (includes  "Nevetie-Babu*  and  *Marr- 
Hiidrath’  edge  detectors.  thinning,  linking  and  linasr 
approximation)  is  now  avai'.abla  to  run  in  C.  Tha  programs 
to  analyte  surfaces  using  curvature  hava  also  baan 
convertad  to  C  on  a  VAX.  all  naw  davelopmant  is  in  C.  Our 
linaar  faatura  matching  method  for  imaga  to  map 
conraspondance  [12L  starao  matching  (5)  and  region 
relaxation  labelling  programa  [13]  have  been  largely 
converted  to  run  in  USP  and  ara  being  exercised  on  a 
number  of  naw  images.  Table  1  r  ■'"•srizes  the  state  of 
this  transition  and  some  comparisons  of  relative  run-timss 
on  different  systems. 


SUtSVSTIM 

TAAOfT  » VfTCM 

STATUS 

tfmOXMATI' 

PfWOMUNCl 

m>00F 

MtrKhOR 

iy«*R* 

U*«-C 

•OM0 

t 

uw0f  CowoMfW 

U0M  C 

am 

\ 

u<m> 

U0M  C 

00470 

4a 

U*00f 

U’ftaa  C 

•0*0 

4a 

S#g*i0*<  M0tc»«f*f 

IMs*  -  IfKU  i>00 

tona 

•a 

SymttoKl 

ySa 

— 

Sl«r«0  MMChenf 

SyffittoMt  ti«# 

0F00«lklW0r  y  y#r8‘O0 

•— 

%****«  E*W0CliO* 

SyfnMMCI  t >80 

S084C  SfM«M  00*0 

10800 

►OF-  10  ti80 

B08*C  1*118*  #0*0 

7a 

W1IM04.Q*  MlKIMH 

Uni.  -  IWU  1.80 

S«8*C  8V8I04W  00*0 

30a 

Sym0o4ecft  -  Lf»0 

•0*0  |0at0*ood) 

1 

i»fm8M«1iOR 

SymtxXi  1/0 

Co*o»  00*0 

«Xl0f184O*  *f*F10d 

1  a 

*iw*0*  'W|tiv«  10  ers90<1  FO0-  10 

Ttble  1 


Refe*encec 

1  Fan.  T.J..  Mediom.  G  and  Navatla  ft..  "Description  of  3- 
D  Surfacss  Using  Curvature."  Proceedings  of 
DARPA  Image  Understanding  Workshop, 
Miami.  Fla..  December  1985. 


2.  Rao.  G.  Kashipati  and  Nevetie.  R..  ''Generalized  Cone 
Description  From  Sparse  3-D  Data."  Proceedings 
of  DARPA  Image  Understanding  Workshop. 
Miami.  Fla..  December  1985. 

3.  Mohan.  R..  "Error  Detection  and  Correction  for  Stereo." 

Proceedings  of  DARPk  Image 

Understanding  Workshop,  Miemi.  Fla.,  December 
1985 

4.  Yasumoto.  Yosmo  and  Medioni.  Gerard.  "Robust 

Estimation  of  3-D  Motion  Parameters  From  a 

Sequence  of  Image  Frames  Using  Regularization." 
Proceedings  of  DARPA  Image 

Understanding  Workshop,  Miami.  Fia.  December 
1985. 

5.  Medioni.  G.  and  Nevatia  R..  "Segment-Based  Stereo 
Matching."  Computer  Vision,  Graphics  and 
Image  Processing ,  Voi  31.  No.  1.  July  1985.  pp. 
?-1 8. 

6.  Tarzcpoulos.  0.  Multlresolutlor,  Computation 
of  Visible-Surface  Representations,  PhD 
dissertation.  Massachusetts  institute  of  Technology. 
Departments  of  Computer  Science  and  Electrical 
Engineering.  January  1984. 

7.  Terzopoulof.  0.  "Computing  Visible  Surface 
Representations."  Massachusetts  Institute 
Technology  AI  Lab,  No  AJ  Memo  800.  1985. 

8.  Dixit.  V.  and  Moldovan.  D.l,  "Semantic  Network  Array 
Processor  and  Its  Application  to  Imagu 
Understanding."  Proceedings  of  DARPA  Imago 
Understanding  Workshop,  New  Orleans.  October 
1984.  pp.  85-71. 

Poggio.  T.  "Massachusetts  institute  Technology 
Prograss  in  Understanding  images."  Proceedings 
of  Image  Understan  Ing  Workshop,  n*rw 
Orleans.  IA  October  1984.  pp.  14-21 

13.  Sheiiat-Panahi.  Hormnz.  "The  Motion  Proh'em  A 
Decomposition-Based  Solution."  Proceedings  of 
Computer  Vision  and  Pattern  Recognition 
Conference,  San  Francisco.  Ca.  June  19-23  198!i. 

11.  Nevatia.  R..  "Image  Understanding  Research  at  USC: 
IP'j-84."  Proceedings  of  DARPA  Image 
Understanding  worksh'  p,  1984. 

12  Medioni.  O.  and  Nevatia  R..  "Matching  images  Using 
Linear  Features."  IESE  Transactions  On 
Pattern  Analysis  And  Machine 

Intelligence,  Vol.  6.  Mo.  6.  November  1984.  op. 
675-685 

’3  Price.  K.  "Relaxation  Matching  Techniques  -  A 
Comparison"  IEEE  Pattern  Analysis  and 
Machine  Intelligence,  Vol.  17.  No  5. 

Saptambar  1985.  pp  617-623. 


1? 


g<:qi6im5 


Recent  Progress  of  the 
Rochester  Image  U  nderstanding  Project 


J.A.  Feldman  and  C..M.  Brown 


Computer  Science  Department 
U niversity  of  Rochester 
Rochester.  New  York  14627 


1.  SoLust  Visbn  Operators 

1.1.  Parameter  Networks  and  the  Hough  Transform 

One  of  the  mo«t  di fficult  problems  in  vision  is 
segmentation.  Recent  work  has  shown  how  to  calculate 
intrinsic  images  (e.g.,  optical  flow,  surface  orientation, 
occluding  contour,  and  disparity).  These  images  are 
distinctly  easier  to  segment  than  the  original  intensity 
imager.  Such  techniques  can  be  greatly  improved  by 
incorporating  Hough  methods.  The  Hough  transform  idea 
has  been  developed  into  a  general  control  technique. 
Intrinsic  image  points  are  mapped  (many  to  one)  into 
‘parameter  networks  [Ballard,  1983].  This  theory  explains 
segmentation  in  terms  oi  highly  parallel  cooperative 
computation  among  intrinsic  images  and  a  set  of 
parameter  spaces  at  different  levels  of  abstraction. 

Earlier  work  on  the  Hough  transform  [Brown,  1983; 
Brown  &  Sher,  1982]  has  led  in  three  directions. 

1)  Research  toward  a  theory  of  cache  accumulator 
arrays  [Loui,  1983;  Brown  &  Feldman.  1983] 

2)  Experiments  with  complimentary  HT  «nd 
cache  management  strategies  [irown  et  al„  1983, 

3)  Hardware  (VI.SI)  designs  for  HT  vote  caches 
[Sher  &  Tevanian,  1983], 

Recent  efforts  extend  these  ideas  and  are  moving  them 
into  the  parallel  computing  environme  it  of  the  BBN 
Butterfly. 

1.2  Bayesian  Detectors 

A  recent  extension  of  our  work  on  low  level  operators 
involves  the  exploration  of  omimal  operators  for  early 
vision.  This  is  one  aspect  ofoui  increased  effort  on  formal 
evidence  theory  and  its  application  in  intelligent  systems 
(Kyburg,  1985:  Sher.  i9S^[. 

1.3  High  Level  Planning 

In  ger.ercl.  problem  sobers  cannot  hope  to  create  plans 
that  a.e  able  to  specify  fully  all  the  details  of  operation 
beforehand  and  must  deperd  on  ’•un-time  modification  of 
Lhe  rla:.  to  insure  correct  functioning,  '.‘he  iun-ume 
planning  idea  becomes  particularly  important  when 
different  plan  segments  are  being  exp’  ired  concurrently. 


These  communiczting  segments  may  require  sophisticated 
actions  e  g.  (do  PLANX  until  PLAN^).  These  issues  were 
studied  by  [Russell.  1984]  in  the  contest  of  a  cooperative 
planning  and  execuuon  system  for  manipulation  tasks.  A 
recent  effort  [Ballard.  1984]  is  examining  robot  planning 
from  a  task  frame  perspective. 

2.  Computing  with  Connections 

We  are  continuing  our  nerest  in  problem-scale 
parallelism,  both  as  a  mcdel  of  animal  brains  and  as  a 
paradigm  for  VLSI  and  parallel  computing  [Feldman  et 
a! ..  198^].  Work  at  Rochester  has  concentrated  on 
connect'onist  models  and  their  application  to  vision.  The 
framework  is  built  around  computational  modules,  the 
simplest  of  which  are  termed  p-umts.  We  have  developed 
their  properties  and  shown  how  they  can  be  applied  to  a 
variety  of  problems  [Feldman  &  Ballard,  1982],  We  have 
also  established  powerful  techniques  for  adaptation  and 
change  in  these  networks  [Feldman,  1982], 

A  major  milestone  was  achieved  with  Sabbah 's  thes's 
on  massively  parallel  recognition  of  Origami-world  objects 
[Sabbah.  1982).  Sabbah’s  work  extended  the  connectionirt 
methodology  to  a  problem  domain  with  several 
hierarchical  structural  levels.  The  resulting  program  is.  to 
our  knowledge,  the  most  noise- resistant  system  for  dealing 
wiih  this  level  of  complexity.  One  outcome  of  Sabbah  s 
etfor.  has  been  a  projec.  to  build  a  general  purpose 
simulator  for  massively  parallel  systems  [Small  et  al.. 
1982|. 

The  general  connectionist  simulator  has  been  well 
tested  and  is  being  used  in  a  number  of  applications.  One 
project  involved  a  quite  detailed  simulation  of  motor 
control  networks  of  the  occulo-motor  sysum  [Addanki, 
1983].  Another  application  is  to  a  spreading  activation 
model  of  word  sense  disambiguation  and  related  problems 
in  natural  language  understanding  [Cottrell,  1985;  Cottrell 
&.  Small.  1983].  A  major  effort  involves  modelling 
concc,  tual  knowledge  (such  as  that  needed  for  high  level 
vision)  in  connectionist  terms  (Feldman  &  Shastn.  1984; 
Shastri  &.  Fridman.  1984|. 

The  second  generation  connectionist  system  has  now 
been  in  active  use  for  well  over  a  year.  Compatible 
versions  run  on  the  VAX  systems.  SU^  workstations  and 
the  BBN  Butterfly  multi-computer.  As  expected,  'he 
Butterfly  version  makes  excellent  use  of  the  available 


«JL  "  W  -Jf 


16 


parallelism.  This  is  one  of  several  approaches  at  Rochester 
to  exploiting  the  parallel  computing  capabilities  of  the 
Butterfly  for  image  understanding. 

3.  Parallel  Computation  in  Image  Understanding 

It  has  been  clear  for  many  years  that  practical  solutions 
tc  Image  Understanding  problems  will  require  parallel 
computation.  Great  progress  has  been  made  ir.  early  vision 
and  in  the  development  of  machines  specialized  for  these 
computations.  Intermediate  and  recognition  level  vision 
are  more  complex  and  it  is  much  less  obvious  how  to 
compute  them  in  parallel.  This  has  been  a  major  focus  of 
the  Rochester  effort  for  several  years. 

Our  approach  to  parallelism  in  Image  Understanding  is 
based  on  the  belief  that  general  purpose  (MIVIO) 
r-.achines  will  work  out  best  for  the  full  range  of  vision 
proglems  [Feldman.  1985c|.  Our  work  has  taken  a  major 
step  forward  with  the  successful  installation  of  \  128- 
processor  BBN  Butterfly  computer.  In  addition  tc  the 
massively  parallel  approaches  mentioned  above,  we  are 
looking  at  conventional  vision  algorithms  and  at  general 
purpose  parallel  programming  methodologies  [Brown  ei 
ai.  1985J. 

4.  Motion  and  Texture 

Our  interest  in  motion  has  centered  around  methods 
for  extracting  rigid  body  parameters  from  optic  flow  and 
intensity  images.  These  parameters  are  extremely  useful  in 
navigation  and  target  tracking.  Early  work  showed  how 
theje  nine  parameters  (origin,  translational  velocity, 
rotational  velocity)  can  be  extracted  from  flow  via  a 
Hough  technique  (Bailard  A  Kimball.  1983).  A  more 
recent  model  exploits  multiple  channels  [Bandyopadhyay. 
i984|.  We  are  also  pursuing  the  use  of  these  parameters  to 
speed  up  the  flow  computation  themselves  [Stuth  ei  al.. 
1983],  A  major  current  effort  relates  optical  flow 
information  to  surface  orientation  [Aloimonos  A  Brown, 
1984a|  and  sensor  motion  (Aloimonos  A  Brown,  1984b|. 

Recent  work  at  Rochester  has  characterized  the  various 
motion  algorithms  according  to  their  dependence  on  flow 
or  matching  ano  in  the  assumed  geometric  transofrm.  This 
has  lead  to  clarification  and  several  new  results  on  die 
number  of  degrees  of  freedom  needed  in  different 
paradigms  [Aloimonos  &  Bandyopadhyay.  1983; 
Bandyopadhyay  A  Aloimonos.  1985|. 

There  has  also  been  a  renewed  effort  at  exploiting 
texture,  particularly  the  relative  size  and  orientation  of 
texture  elements.  Ihis  has  led  to  nice  results  on  shape 
[Aloimonos  A  Swain.  1985;  Aloimonos  A  Chou.  1985] 
with  potential  applications  elsewhere.  Other  work  concerns 
shape  from  multiple  views  [Aloimonos  el  ll..  1985J. 

5.  Shape 

The  description  and  recognition  of  complex  shapes 
continues  to  be  a  major  focus  of  the  project  The  analysis 
of  the  dot  produce  space  representation  has  been  improved 
to  handle  certain  pathological  cases,  and  has  been 
generalized  to  accommodate  different  criteria  for  the 
goodness  of  the  representation. 


This  simple  concept  cf  shape  lias  been  applied  to  the 
problem  of  reconstructing  three-dimensional  surfaces  from 
very  sparse  data.  The  key  idea  is  to  use  appropriate  shape 
descriptors  to  hypothesize  a  transformation  which  accounts 
for  the  difference  in  shape  between  successive  contours. 
When  the  hypothesized  transformation  is  minor,  very 
simple-minded  surface  reconstruction  techniques  are 
sufficient  When  there  are  major  differences  in  shape  or 
position  between  successive  contours,  our  method 
hallucnates  new  contours,  using  the  hypothesized  shape 
transformation  (Sloan  A  H.erhanyk.  1581).  A  major  new 
effort  is  the  extraction  and  use  of  sy  mmetries  in  images 
[Freidberg  A  Brov  n.  1984). 

Hierarchical  descriptions  of  shapes  were  considered  in 
[Cai'^rd  A  Sabbah.  1981]  in  a  preliminary  fashion.  Our 
previously  reported  shape  model  [Hrechanyk  A  Ballard. 
1982]  concentrated  on  problems  of  view-m  variance  and 
attention  shining  within  a  single  prototype.  This  model  has 
been  emended  to  handle  the  problems  of  extracting 
primitive  shape  descriptions  from  noisy  images.  Our  work 
was  motivated  by  dissatisfactions  with  smoothness  criteria 
for  intrinsic  image  amputations.  Recent  work  extends 
these  ideas  to  simple  3-0  shapes  [Ballard  ei  ai.  1984). 


The  practical ity  of  shape  from  shading  computations 
and  their  interaction  v  ith  the  determination  of  other 
image  parameters  (such  as  illuminant  position)  was 
addressed  by  two  papers  in  the  Fall.  1982  DARPA  Image 
Understanding  Workshop.  We  are  now  applying  the 
algorithm  to  real  images,  and  want  to  investigate  scenes 
with  r .on-Lam be rt.an  reflectance  functions  that  are 
unknown  apriori.  We  want  to  explain  how  humans  in  fact 
use  shading  to  derive  shaoe,  given  the  complexity  of 
reflectance  functions  and  imaging  situations  in  the  world. 
Two  competing  theories  are  that  somehow  the  reflectance 
functions  »re  derived  fairly  accurately  by  an  adaptive 
procedure,  or  instead  that  we  only  'support'  a  small 
numoer  of  reflectance  functions  that  are  selected  by  other 
cues  (such  as  gloss). 

6.  General  Theory  of  Vision 

Work  in  our  laboratory,  among  others,  has 
demonstrated  strong  links  between  powerful  Image 
Understanding  techniques  and  computations  used  by 
animal  visual  systems.  We  have  established  strong  ties  with 
a  wide  range  of  visual  scientists  at  Rochester  end  a  variety 
of  collabora'ive  efforts  are  underway.  One  early  project  is 
to  survey  the  computational  similarities  in  natural  and 
computer  vision  [Ballard  A  Coleman.  1983], 

One  part  of  our  general  effort  in  understanding  vision 
(and  related  problems)  is  a  comprehensive  look  at 
evidence  theory  in  AI.  One  can  view  recognition  and 
decision  problems  as  combining  uncertain  evidence  and 
the  formal  method  of  combination  is  a  critical  question. 
We  have  examined  the  traditional  [l.oui  el  ai.  1985]  and 
Dempster-Shafer  |Kyburg,  1985]  methods  and  discovered 
some  of  their  strengths  and  weaknesses.  An  evidential 
system  based  on  the  maximum-entropy  principle  has  been 
quite  effective  in  simple  problems  of  categorization  and 
inheritance  |Shastn.  1985;  Shastri  A  Feldman,  1985]. 


,-v  A 


s'.--'..', 


\',v' 


i 


1 7 


We  have  begur,  to  exploit  Rochester  neurobiology 
expertise  in  order  to  hone  and  impiove  our  connections' 
modelling  effort.  One  difficult  avenue  is  to  specify  ii'e 
interface  between  our  computational  models  and  the  state- 
of-the-art  neurobiological  picture.  Our  efforts  in  this 
direction  art  summarized  in  [Ballard,  to  appear]  and  the 
collaboration  is  continuing.  Another  effort  is  our  attempt 
to  develop  a  general  framework  for  theories  of  vision  that 
would  provide  a  common  structure  for  integrating  studies 
from  various  disciplines  [Feldman.  1 985a|.  These  efforts 
are  already  being  reflected  back  on  our  applied  Image 
Understanding  efforts  [Feldman,  1985b], 

References 

Addanki.  S„  "On  a  Distributed  Approach  ro  Occulotnotor 
Control.”  TR121,  Computer  Science  Dept..  Univ. 
Rochester,  1983. 

Aloimonos.  J..  "One  Eye  Suffices:  A  Computational 
Model  of  MonocuLr  Depth  Perception."  TR160, 
Computer  Science  Dept,  Univ.  Rochester,  December 
1984. 

Aloimonos,  J.  and  A.  Bandyopadhyay.  "Perception  of 
Structure  from  Motion:  Lower  Bound  Results," 
TR158.  Computer  Science  Dept..  Univ.  Rochester, 
March  1985. 

Aloimonos,  J.,  A  Bandyopadhyay  and  P.  Chou.  “Or.  the 
Foundations  ol  Tnnocular  Machine  Vision,"  Technical 
Digest,  Topical  Meeting  of  Optical  Society  of  America, 
Lake  Tahoe.  April  1985. 

Aloimonos.  J.  and  C.M.  Brown.  "The  Relationship 
Between  Optical  Flow  and  Surface  Orientation," 
Proceedings,  7th  ICPR.  Montreal,  August,  1984a. 

Aloimonos.  J.  and  C.M.  Brown,  “Direct  Processing  of 
Curvilinear  Sensor  Motion  From  a  Sequence  of 
Perspective  Images.-'  Proceedings.  1984  IEEE 
Workshop  on  Computer  Vision  Representation  and 
Control.  Annapolis.  Md..  72-77.  May.  1984b. 

Aloimonos,  i.  and  P.  Chou.  "Detection  of  Surface 
Orientation  from  Texture."  TR 161.  Optic  News, 
September  1985. 

Aloimonos,  J.  and  M.  Swain,  "Shape  from  Texture," 
Proceedings.  IJCAl,  Los  Angeles.  August  1985. 

Ballard,  D.H..  "Cortical  Connections:  Structure  and 
Function,"  TR133.  Computer  Science  Dept.,  Univ. 
Rochester,  July  1984;  to  appear.  Behavioral  and  Brain 
Sciences. 

Ballard.  D.FL,  "Task  Frames  ir>  Robot  Manipulation." 
Proceedings,  AAAI-84,  August.  1984. 

Ballard,  D.H..  "Parameter  Networks:  Towards  a  Theory  of 
Low-Level  Vision."  Proceedings.  7th  Int  i.  Joint  Conf. 
on  Artificial  Intelligence,  Vancouver.  British  Columbia, 
August  1981. 


Ballard,  D.H.  and  O.A.  Kimball.  "Rigid  Body  Motion 
from  Depth  ar.d  Optica!  Flow."  CVGIP  Special  Issue 
on  Computer  Vision.  1983;  also  TR70,  Computer 
Science  Dept.,  Univ.  Rochester,  November  1981. 

Ballard.  D  H.  and  D.  Sabbah,  "Detecting  Object 
Orientation  from  Surface  Normals."  Proceedings.  7th 
IJCAl.  Vancouver,  British  Columbia.  August  1981. 

Ballard.  D.H.  and  H.  Tanaka.  "Transformational  Form 
Perception  in  3D:  Constraints,  Algorithms, 
Implementation."  Proceedings,  lni‘1.  Joint  Conf.  on 
Artificial  Intelligence.  Los  Angeles.  CA.,  964-968. 
August  1985 

Ballard,  D.H.,  A.  Bandyopadhyay.  J.  Sullins  and  H 
Tanaka.  "A  Connections  Polyhedral  Model  of 
Extrapersonal  Space."  Proceedings,  1984  IEEE  Conf. 
on  Computer  Vision,  Annapolis.  MD.,  May.  1984. 

Bandyopadhyay.  A..  "Constraints  on  the  Computation  of 
Rigid  Motion  Parameters  from  Retinal 
Displacements."  TR168.  Computer  Science  Dept, 
Univ.  Rochester.  October  1985. 

Bandyopadhyay.  A..  "A  Multiple  Channel  Model  for 
Perception  of  Optical  Flow,”  Proceedings,  1984 
Workshop  on  Computer  Vision  Representation  and 
Control,  Annapolis,  Md..  May.  1984,  78-82. 

Bandyopadhyay.  A..  "Interest  Points,  Disparities  and 
Corresponae.ice.”  Proceedings.  DARPA  Image 
Understanding  Workshop.  New  Orleans,  La.,  1984. 

Bandyopadhyay.  A.  and  J.  Aloimonos,  "Perception  of 
Rigid  Motion  from  Spatiotemporal  Derivatives  of 
Optical  Flow,"  TR157.  Computer  Science  Dept.  Univ. 
Rochester.  March  1985. 

Bandyopadhyay,  A.  and  J.  Aloimonos.  "Perception  of 
Motion  for  Rigid  Objt"\  ’  TR169,  Computer  Science 
DepL.  Univ.  Rochester,  forthcoming. 

Brown.  C.M.,  "Bias  and  Noise  in  Hough  Transfo.m: 
Theory."  Pattern  Analysis  and  Machine  Intelligence, 
1983:  also,  TiUG5.  Computer  Science  Dept.  Univ. 
Rochester.  July  1982. 

Brown.  C.M,  M.  Curtiss,  and  D.  Sher.  "Bias  &  Noise  in 
Hough  Transform:  Experiments."  Proceedings.  1JCAI- 
83.  Karlsruhe.  West  Germany,  August.  1983:  also 
TRU3.  Computer  Science  Dept.,  Univ.  Rochester. 
1982. 

Brown.  C.M..  C.S.  Ellis,  J.A.  Feldman,  S.A.  Fricdberg  and 
T.J.  LeBlanc.  "ArfTicial  Intelligence  Research  on  the 
Butterfly  Multiprocessor,"  Proceedings,  Workshop  on 
Al  and  Distributed  Problem  Solving,  National 
Academy  of  Sciences.  Washington.  DC.  109-118,  May. 
1985. 

Brown.  C.M.  and  J.A.  Feldman.  "Statistical  Questions 
Arising  in  the  Use  of  Hough  Techniques  in  Image 
Understanding,''  Proceedings,  ONR  Workshop  on 
Statistical  Image  Processing  and  Graphics  Workshop, 
l.uray ,  VA..  24-27  May  i983. 


13 


l-  -• 


Brown.  C.M.  and  D.  Sher,  "Modeling  the  Sequential 
Behavior  of  Hough  Transform  Schemes,"  Proceedings, 
DARPA  Image  Understanding  Workshop,  November, 
1982:  also  TR1I4,  Computer  Science  Dept.,  Univ. 
Rochester,  August  1982. 

Cottre'l.  G.W.,  "A  Conectionist  Approach  to  Word  Sense 
Disambiguation."  Ph.D.  thesis.  Computer  Science 
Dept..  Univ.  Rochester,  April  1985;  also  TR145, 
Computer  Science  Dept.  Univ.  Rochester,  May,  1985. 
Cottrell.  G.W.,  'Parallelism  in  Inheritance  Hieraichies 
with  Exceptions,"  Proceedings,  Int  i.  Joint  Conf.  on 
Artificial  Intelligence.  Los  Angeles,  CA.  194-202. 
August  1985. 

Cottrell,  G.W.  and  S.L.  Small,  "A  Connectionist  Scheme 
for  Modelling  Word  Sense  Disambiguation."  Cognition 
and  Brain  Theory.  6  (l),  89-120,  1983. 

Feldman.  J.A.,  "A  Connectionist  Model  of  Visual 
Memory,”  in  Parallel  Models  of  Associative  Memory, 
G.E.  Hinton  and  J.A.  Anderson  (eds ).  Hillsdale,  NJ: 
Lawrence  Erlbaum  Associates,  publishers.  1981. 

Feldman.  LA..  "Dynamic  Connections  in  Neural 
Networks."  Biological  Cybernetics,  46.  27-39.  1982. 
Feldman.  J.A..  "Four  Frames  §uffice:  A  Provisional 
Model  of  Vision  and  Space,"  Behavioral  and  Brain 
Sciences.  8.  265-289.  June  1985a. 

Feldman,  J.A.,  "Connectionist  Models  and  Parallelism  in 
High  Level  Vision."  Computer  yision.  Graphics  and 
Image  Processing,  31.  178-200.  1985b. 

Feldman.  J.A..  "Connections  Massive  Parallelism  in 
Natural  and  Artificial  intelligence,"  BYTE.  277-284. 
April  1985c. 

Feldman.  J.A..  "Energy  and  the  Behavior  of  Connectionist 
Models."  TR155.  Computer  Science  Dept.  Univ. 
Rochester.  November.  Iv85d. 

Feldman.  J.A,  "A  Functional  Model  of  Vision  ar.d 
Space."  to  appear,  yision.  Brain  and  Cooperative 
Computation.  1986. 

Feldman.  J.A.  and  D.H.  Ballard,  "Connectionist  Models 
and  Then  Properties."  Cognitive  Science.  6.  205-254. 
1982. 

Feldman.  J.A.  and  L.  Shastri,  "Evidential  Inference  in 
Aciivation  Networks."  Proceedings.  Cognitive  Science 
Conference,  Boulder,  Co..  July  1984. 

Feldman,  J.A.  and  l .  Shastri.  "Neural  Nets,  Routines  and 
Semantic  Networks."  to  appear  Directions  in  Cognitive 
Science,  1986. 

Feldman.  J.A..  D.H.  Ballard,  C.M.  Brown  and  S.L.  Small. 
"Rochester  Conrectionist  Papers.  197°- 1984,”  TR124. 
Computer  Science  Dept.,  Univ.  Rochester.  June  1984. 
Friedberg.  S.A.  and  C.M.  Brown.  "Symmetry  Evaluators". 
Proceedings.  DARPA  Image  Understanding  Workshop. 
New  Orleans,  LA.  1984. 

Friedberg,  S.A.  and  C.M.  Brown.  "Finding  Axes  of 
Skewed  Symmetry."  Proceedings.  7th  ICPR.  Montreal. 
August,  1984. 

Hrechanyk,  L.M.  and  D.H.  Ballard.  "Viewframes:  A 
Connectionist  Model  of  Form  Perception." 
Proceedings,  DARPA  W’orkshop,  June  1983. 


Hrechanyk,  L.M.  and  D.H.  Ballard.  "A  Connectionist 
Model  of  Form  Perception,"  Proceedings.  Workshop 
on  Computer  Vision:  Representation  and  Control. 
Rir.dge,  New  Hampshire,  August  1982:  also.  Computer 
Science  and  Engineering  Research  Review,  Computer 
Science  Dept.,  Fail.  1982. 

Kyburg,  H  E.  Jr..  "Bayesian  and  Non-Bayesian  Evidential 
Updating,  TR139  (revised).  Computer  Science  Dept., 
Univ.  Rochester.  July  1985. 

Lampeter,  W..  "Design,  Function  and  Performance  of  a 
System  that  Screens  Chest  Radiographs  for  Tumors," 

•  Proceedings,  1st  CVPR  Conference,  June  1983. 

Loui,  R.P..  J.A.  Feldman  and  H.F_  Kyburg.  Jr.,  "lntervai- 
Based  Decisions  for  P.easoning  Systems,"  Proceedings, 
AAAI  Workshop  on  Probability  and  Uncertainty  in 
AL  1985. 

Russell,  D.M.,  "Schema-based  Problem  Solving,"  Ph.D. 
Dissertation,  Computer  Science  Dept..  Univ. 
Rochester,  December  1984. 

Sabbah,  D..  “Computing  with  Connections  in  Visual 
Recognition  of  Origami  Ob‘**cts,”  Cognitive  Science.  9. 
1.  25-50.  Winter.  1985. 

Shastn,  L..  "Evidential  Reasoning  in  Semantic  Networks  • 
A  Formal  Theory  and  its  Parallel  Implementation," 
Ph.D.  thesis.  Computer  Science  Depr..  Univ. 
Rochester.  July  1985,  also,  TR166.  Computei  Science 
Dept..  Univ.  Rochester.  September  1985. 

Shastri.  L  and  J.A.  Feldman,  "Semantic  Networks  and 
Neural  Nets,"  TR131.  Computer  Science  Dept.  IJniv. 
Rochester,  lune.  1984. 

Shastn.  L.  and  J.A.  Feldman.  "Evidential  Reasoning  in 
Semantic  Networl.s.  A  Formal  Theory."  Proceedings. 
IJCAi,  August  1985. 

Sher.  D ,  "Template  Matching  on  Parallel  Architectures," 
TR156.  Computer  Science  Dept.  Univ.  Rochester.  July 
1955. 

Sher  D..  "Developing  and  Analyzing  Boundary  Detection 
G  ->erators  Using  Probabilistic  Models,"  Proceedings, 
ACM  Workshop  on  Uncertainty  and  Probability  in 
Artificial  Intelligence,  August,  1985. 

Sher,  D.  and  A.  Tevanian,  "A  Hough  Chip,"  Internal 
course  project  report.  Computer  Science  Dept..  Univ. 
Rochester.  April  1983. 

Small.  S.L..  L.  Shastri.  M  L.  Brucks.  SG.  Kaufman.  G.W. 
Cottrell  and  S.  Addanki.  "ISCON:  A  Network 
Construction  Aid  and  Simulator  for  Connectionist 
Models,"  TR109  Computer  Science  Dept..  Univ. 
Rochester.  September  1982. 

Sloan.  K.R..  Jr.  and  L.M.  Hrechanyk.  "Surface 
Reconstruction  from  Sparse  Data."  Proceedings: 
Pattern  Recognition  and  Image  Processing.  Dallas. 
Texas.  August.  1981. 

Stuth.  B.H.,  D.H.  Ballard  and  C.M.  Brown.  "Boundary 
Conditions  in  Multiple  Intrinsic  Images."  Proceedings, 
IJCAl-83,  Karlsruhe.  West  Germany,  1983. 


SPATIAL  UNDERSTANDING 


Thomas  O.  Blnford 


Stanford  Artificial  Intelligence  Laboratory 


ABSTRACT 

Our  goal  ■  to  build  aa  intelligent  ttereo  vision  system  which 
incorporates  both  general  and  special  knowledge.  Its  purpoe,  is 
ttereo  mapping,  construction  of  surface  maps  with  symbolic 
information,  e.g.  horizontal  and  vertical  surfaces  and  feature  analysis. 
A  preliminary  report  is  made  of  a  high  level  stereo  system  which 
folds  correspondence  among  extended  curves,  junctions,  and  surfaces. 

The  system  relies  on  extended  carves  and  vertices  obtained 
from  a  sew  edge  segmentation  and  curve  linking  system  described  in 
these  proceedings.  Another  report  describes  psychophysics 
experiments  to  infer  human  mechanisms  for  detecting  image 
structures  by  mechanisms  which  are  parallel.  Striking  new  results  are 
obtained;  a  mechanism  is  suggested  for  computer  perception.  An 
analysis  and  experiments  are  presented  concerning  specular  reflection 
in  order  to  estimate  principal  curvatures  of  surfaces  from  tne  width 
of  spectral  peaks,  and  to  make  symbolic  predictions  from  object 
models. 

An  algorithm  is  reported  for  labeling  line  drawings  of  opaque, 
curved  objects  bounded  by  piecewise  smooth  surfaces.  The  method  is 
mathematically  rigorous  lad  complete  for  scenes  without  surface 
markings,  and  illumination  discontinuities. 

INTRODUCTION 

The  goal  of  this  research  program  is  to  build  an  intelligent 
stereo  vision  system  by  developing  comprehensive,  fundamental 
geometric  representation  to  express  generic  world  constraints  and 
special-purpose,  domain-specific  knowledge.  This  research  in  model- 
based  systems  is  relevant  to  general  systems  as  well  as  systems  for 
limited  classes  cf  objects.  The  research  is  directed  toward  solving 
major  problems  faced  in  building  image  understanding  systems: 


1.  The  software  problem:  Programming  major  systems  and 
application  programs  is  the  dominant  computer  science  problem.  It 
is  accentuated  by  the  difficulty  of  computational  vision  and  the 
special  knowledge  required.  We  investigate  automatic  model-based 
generation  of  programs. 

2.  Computation:  Vision  algorithms  often  require  about  four 
order  of  magnitude  more  computation  than  VAX-class  general 
computers  provide.  We  develop  general  methods  for  prediction  from 
models  to  cut  computation  by  automating  focus  of  attention,  use  of 
special-purpose  shortcuts,  and  choice  of  efficient  strategic*  for 
perception. 

3.  Segmentation:  Segmentation  algorithms  have  many 

weaknesses.  We  are  budding  models  of  sensors  and  operator*  is 
order  to  automate  selection  of  effective  feature*  tailored  for 
individual  problems.  We  study  incorporation  of  available  knowledge 
at  all  levels. 

4.  General  system:  The  breadth  of  computational  perception 
is  enormous.  We  defia  .  resentation  for  generic  modcla  and 
coostrainta  in  order  to  build  systems  from  n  set  of  compact  and 
uniform  perceptual  mechanisms  with  brosd  relevance. 

One  application  objective  of  oar  research  is  construction  of 
surface  maps  with  symbolic  information,  e.g.  horirontal  and  vertical 
surfaces.  A  second  is  construction  of  feature  maps  which  is  entirely 
manual  and  very  lime  -consuming.  A  third  objective  measurement  of 
surface  dimensions  in  p'uoloiolerpretation. 

To  m>ct  these  objectives  we  are  building  a  high  level  stereo 
system  intended  to  incorporate  a  broad  base  of  constraints  and 
knowledge  at  all  levels  of  structure.  This  system  is  to  incorporate 
inference  rules  and  quasi-invariants  for  surface  interpretation  [Biuford 


v'v\V->. 

,N  >  . 

S'a'S'SS 

hi  ■  h\; 

- 

*  »  .tTtt 


RBW»T 


31].  Stereo  s/stems  evolve  to  include  more  powerful  geometric 
ronstraints.  However,  there  is  a  long  way  to  go.  Most  systems  have 
used  few  and  weak  constraints,  mostly  restricted  to  epipolar 
correspondence. 

Figure  1  shows  our  structure  for  an  intelligent  stereo  vision 
system.  On  oue  hand  ..re  models,  on  the  other  hand  are 
observations.  The  system  makes  predictions  from  models;  it 
constructs  observations  from  observations  and  data.  The  system 
matches  observations  with  models.  There  are  four  gross  levels  of 
system  structure: 

1.  object  classes  are  fractional  classes  in  four-dimensional 
space-time,  behavior  classes; 

2.  surface  and  volume  structures  in  three  space  include  viewer- 
centered  and  object-centered  representations; 

3.  image  structures  in  two  space  relate  image  features; 

4.  signals  and  images  are  raw  data. 

Program  modules  construct  elements  in  the  boxes.  "Shape- 
from-x-  modules  build  surface  observations,  by  direct  rarge 
measurement,  by  stereo,  by  observer  motion,  by  ohier*  motion,  by 
shading,  by  photometric  stereo,  by  inference  and  interpolation  from 
image  shape,  or  by  texture.  Segmentation  and  aggregation  modules 
extract  image  features  and  build  image  structural  descriptions. 

Typically,  models  refer  to  models  of  individual  objects.  Here 
models  permeate  the  system.  There  are  models  at  all  levels;  at  each 
level  models  range  from  generic  to  individual.  Individual  models 
have  included  individual  aircraft  and  industrial  parts.  Generic 
models  include  generalized  cylinder  parts. 

NEW  STEREO  SYSTEM 

[Liiii  85|  reports  on  the  status  of  a  new  stereo  system.  Our 
previous  systems  incorporated  continuity  among  edge  elements 
between  epipolar  lines  |Arno!d  78),  incorporated  quasi-invariants  for 
matching  corresponding  edges  and  surface  elements  within  epipolar 
lines  [Arnold  80],  and  extended  the  Viterbi  algorithm  for  makiog 
global  surface  correspondence  within  epipolar  lines  | Arnold  82], 
interpolated  surfaces  between  edge  elements  within  epipolar  lines 
[Baker  8l|. 


The  new  system  relates  extended  curves,  junctions,  and 
surfaces.  The  system  r-lies  on  high  quality  output  from  a 
segmentation  system  [Nalwa  85].  Lim  has  improved  the 
determination  o»  junctions  in  order  to  match  corresponding  views  of 
junctions. 

Curves  must  correspond  if  real  and  visible,  i.e.  unless 
extraneous  in  one  image,  missing  <n  one  image,  or  obscured  in  one 
image.  The  new  system  finds  maximal  connected  components  of 
corresponding  curves;  curves  which  are  obscured  are  now  identified 
by  preliminary  methods  which  are  being  extended.  The  aim  is  to 
incorporate  the  line  labeling  analysis  of  |Malik  85]  and  further 
geometric  constraints.  These  conditions  apply  to  curved  surfaces. 
Note  that  quas»-i“variants  nave  already  been  applied  in  stereo  vision 
systems  (Arnold  80].  These  developments  make  the  interpretation  of 
line  drawings  a  key  element  of  p<  tent i ally  practical  systems,  after  it 
had  long  appealed  to  be  an  academic  exercise.  Likelihood  of 
extraneous  or  missing  curves  will  be  modeled  using  a  generic  model  of 
observability  and  by  modeling  segmentation  operators.  Some  missing 
curves  can  be  verified  by  subsequent  curve  verification  operators.  In 
other  work,  [Treindl  84|  describe  matching  strategies  based  on 
building  local  b.ructures  and  determining  the  least  ambiguous 
structures. 

The  system  has  had  initial  tests  with  one  scene  for  which  stereo 
line  Drawings  were  obtained  automatically,  and  with  line  drawings 
input  by  hand  Even  with  improvements  in  vertex  determination, 
some  vertices  in  ceal  data  do  not  correspond.  However,  matching  of 
curves  is  effective  in  toese  cases. 


SEGMENTATION  AND  AGGREGATION 

Segmentation  describes  the  local  structure  of  images  in  terms  of 
features.  Aggregation  describes  global  structure  of  images  iu  terms 
of  natural  relations  among  image  features.  Image  structural  relations 
often  reflect  sp„*ial  coherence  of  objects.  Thus,  image  structures 
suggest  candidate  objects  and  provide  a  form  figure-ground 
discrimination. 

For  model-based  vision  systems,  matching  models  against 
combinatorial  sets  of  features  as  ACRONYM  does,  »s 
computationally  expensive  and  limits  its  utility  greatly.  Determining 
candidate  objects  for  matching  cuts  the  combinatorics  of  matching, 
making  model-based  interpretation  more  widely  relevant.  This  is  a 
divide-and-conquer  approach. 


21 


[Nalwa  84,  85a|  describe  an  edge  segmentation  based  oo 
determining  edge  elements  (edgeis)  which  are  discontinuities  in  the 
intensity  surface.  These  are  oriented  edgeis  defined  on  a  5x5  disk  by 
a  one-dimensional  step  in  the  form  of  a  tanb  function  along  an  edgel 
direction.  EJgel  orientation  and  transverse  location  are  estimated 
with  relatively  high  resolution,  near  information  limits.  Extensive 
error  analysis  was  carried  out  on  the  operator. 

The  output  of  the  operator  is  a  set  of  disconnected  edgeis  which 
know  nothing  about  neigbb  ing  edgeis.  [Nalwa  85b|  determine 
extended  curves  linking  edgeis  and  er‘imate  best- fit  conic  curves  to 
linked  edges.  They  link  edgeis  by  mapping  extended,  directional 
ed»e*'  outo  a  high-resolution  grid  with  half  the  spacing  of  the  original 
grid,  then  thinning  to  preserve  continuity  but  minimal  connectivity. 
Chains  of  connected  edgeis  are  obtained  by  contour  following 
between  junctions 

In  order  to  segment  chains  of  connected  edgeb,  dependence  of 
tangent  angle  vs  arc  ler.gin  is  used.  Straight  segments  h.  *  '  ‘onstani 
an K,e  Circular  arcs  h_  e  linear  angle  vs  arc  length.  vs  arc 

length  is  used  to  find  straight  line  segments,  to  detect  corners,  and  to 
choose  candidate  knots.  Non-straight  segments  are  described  by 
conic  sections.  Arc  (e«»:h  is  a  notoriously  ill-d? fined  parameter;  a 
polygonal  approximation  was  used  to  estimate  arc  length.  A  distance 
measure  for  a  point  from  a  cor*c  was  derived  which  offered 
improvements  over  previous  formulations.  This  measure  produces 
better  conic  fits  than  other  programs.  For  conics,  it  b  necessary  to 
segment  at  inflection  points.  In  attempting  to  define  corners,  the 
edgel  operator  is  appropriate  for  single  step  edges  on  the  disk.  At 
corners,  the  operator  b  inappropriate,  the  dirk  limits  resolution  and 
causes  averaging.  A  high  curvature  area  the  site  u?  the  disk  is 
indistinguishable  from  a  sharp  corner.  T.V*  program  requires  a 
minimum  angle  change  at  a  corner. 

Previous  work  hu  gone  to  determine  structures  beyond 
extended  edges,  especially  cobnear  curves  [Lowe  &3|.  This  work  goes 
on  in  (Vistnes  S5|  which  describes  psychophysics  experiments  to  infer 
human  mechanisms  for  detecting  dotted  curves  in  random  dot 
backgrounds.  Mechanisms  were  investigated  which  might  be  parallel, 

i.e.  pre-at*eotive  without  scanning  and  eye  motion,  with  less  than  200 
msec  exposure.  Interesting  results  were  obtained: 

1.  grouping  of  dotted  curves  is  not  •diameter-limited*,  i.e. 
dotted  curves  are  visible  even  for  large  sparing; 

2.  grouping  of  doited  curves  is  •background-limited*,  i.e. 


dotted  curves  are  risible  over  a  wide  range  of  dot  spacing  and 
background  dot  spacing,  as  long  as  target  dot  spacing  is  less  than 
background  dot  spacing;  this  scaling  phenomenon  b  the  dominant 
feature  of  these  experiments; 

3.  results  were  not  sensitive  to  line  length  or  regularity  of 
target  dot  spacing; 

4.  results  depended  on  target  curvature  and  transverse  point 
scatter. 

Because  random  dots  formed  noticeable  clusters,  more  uniform, 
pseudo-random  dots  were  generated  by  variations  about  grid 
positions  on  a  coarse  grid. 

Results  with  orientation  of  segments  indicate  that  contrary  to  ■ 
previous  experiment  [Riley  81 1,  humans  can  discriminate  textures  of 
segments  with  two  orientations  from  textures  with  random 
orientations. 


INTERPRETATION  OP  SURFACES 

|Malik  85]  presents  an  algorithm  for  labeling  line  drawings  of 
Opaque,  curved  objects  bousded  by  piecewise  smooth  surfaces.  The 
scheme  is  mathematically  rigorous  and  complete  for  scene*  without 
surface  markings,  shadows,  or  illumination  boundaries.  Objects  with 
surfaces  which  have  point  discontinuities,  e.g.  cones,  are  excluded. 
The  analysts  b  for  orthographic  projection. 

The  analysis  deals  with  generic  properties  fur  general 
viewpoint.  The  projection  of  neighborhoods  oo  surfaces  and  edges  is 
catalogued.  Whitney's  results  on  singularities  of  projection  are  relied 
upon  [Whitney  55|.  For  a  smooth  surface  the  only  singularities  are 
folds  and  cusps.  There  are  in  addition,  discontinuities  at  edges. 

A  result  b  proved  that  curved  surfaces  can  be  replaced  by  their  0 
tangent  planes  to  give  identical  tangent  lines  at  images.  I.e.,  image 
vertices  can  be  replaced  by  equivalent  polyhedral  vertices.  A  major 
objective  b  to  achieve  reasonable  interpretations  and  avoid  the  large 
number  of  inter pretatiots  which  even  simple  scenes  like  ■  tetrahedron 
allow.  A  local  simplicity  condition  seems  successful:  find  vertex 
interpretations  with  the  minimum  number  of  faces  meeting  at  a 
vertex. 

[Kirousb  84 j  thow  that  >  labeling  problem  b  NT-complete. 
Nonetheless,  by  rotridering  a  reduced  labeling,  the  practical 
complexity  (distinct  from  worst  case  complexity)  b  quite  low.  An 


22 


efficient  algorithm  *i<  determined  A  program  was  implemented  to 
ualyze  line-drawing*  constructed  by  hand 


SPECULARITY 

[Healey  85|  analyze  spec.  ir  re  -retioa  for  two  parpouei:  to 
estimate  properties  c.  surfaces,  aad  ..  make  symbolic  predictions 
from  object  models.  They  or  the  Tonsure-Sparrow  model  of 
spo  ular  rrfkrtioa  fi  d.  a  roagk  surface. 

Sprcilar  i-'Wtious  are  /tea  the  brightest,  highest  cootrast. 
most  promiaeat  features  ia  images.  Thss  they  are  oftea  easy  to  fiad. 
la  experimeau  with  ACRONYM  [Chelberg  S4[.  speealarKies  were 
foaad  to  be  importaat  Aa  effort  *  u  begaa  to  iaclade  them  ia 
future  systems.  To  be  useful  ia  geae-e‘  ystems.  the  emphasa  wasoa 
dewing  symbolic,  -issi-iavariaat  res  as  which  are  geaeric. 

The  model  laclades  surface  rougkaess  by  aasaruag  a 
distributica  of  surface  oriestatioas  about  the  nominal  It  also 
includes  a  shading  factor  which  is  importaat  at  glaariag  iacideace. 

The  analysis  provides  estimates  of  surface  principal  carretans 
from  the  width  of  tae  specular  peak  It  also  provides  predictaoaa  of 

widths  of  spectral  peaks  ia  terms  of  surface  principal  curvatures  A 
set  of  experiments  krs  been  earned  out  to  demoasrrate  these  results 

REFERENCE 

|Aruold  I978|  D.  Arnold.  ’Local  context  ••  matching  edges  for 
stereo  vision*.  Proceeding:  Image  inderstaading  workshop.  May. 
1978 

|  Arnold  I983|  D  Arnold.  ’Automated  stereo  per  rep*  mu"  J»h  D 
thesis.  Sianforu  Cniversity,  Marek  I  98.  - 

|I1  D  Arnold.  T  O  Biaford|  *Oomel nc  Constraints  m  Stereo 
Vision*:  I’roe.  SPIK  Meeting.  Saa  Diego  Cal.  Jnly  1980 

[Piker.  II  aad  Piaford.  T  <*  |  *I)»pili  from  Kdge  sad  lateasity 
Hased  stereo*.  Prof  lat  Joint  Coaf  on  Al.  Aug  1981 

[llinford  I98l|  T  O  lliaford.  "Inferring  snrfai  es  from  images*. 
Artificial  Intelligence,  Volume  17.  Plot 

|(  helberg  8 1[  D  Chelberg.  II  l.im  C  Cowan.  ’Acronym  Model- 
llased  A  i»Hia  ta  the  Intelligent  Task  Automation  Project*  Prof 
DAIfl’A  II  Workshop,  1984 


[Healey  85j  G  Healey,  TO.  Biaford.  ‘Predicting  Specular 
F-atures*:  Proc  DARPA  IU  Wockshop.  198b. 

J  Kir  ousts  gg]  Kirowra  L.  aad  CH.Papadanithou.  'Complexity 
of  Recog  suing  Polyhedral  Scenes*.  Stanford  Tech  Rept  1984 

[Lira  861  H  S  Lim.  T  O  Biaford,  ’Stereo  Correspoadesre: 

Features  aad  CoaslrataU*;  Proc  DARPA  IU  Workshop.  1986. 

o 

[Lowe  83j  D  Lowe,  T.O.  B*aford;*Perreptaal  Orgaauatsoa  aa  a 
Bsms  for  Visual  Recogarttoa*:  Proc  DARPA  IU  Workshop.  May  1983. 

| Malik  85]  J  Mahk;  ’Labehat  Lme  Drawings  of  Carved 
Objects*:  Proc  DARPA  IU  Workshop  1986 

|Nalwa  S4|  V  S.Nalwa;  *Oa  Detecting  Edges*:  Proc  DARPA  IU 
Workshop,  1984 

[Nalwa  8.6a)  VS  Nalwa.  TO  Biaford:  *Oa  Deeeliwg  Edges*, 
accepted  foe  IEEE  PA.MI.  1985 

|Nalwx  85b)  V  S  Nalwa.  E.Paarhoa:  ‘ Edged- Aggregstroa  aad 
Edge-DescHptioa  * .  Proc  DARPA  IU  Woes  shop  1986. 

(Riley  SI  j  M  Riley:  ’The  represents*  ion  of  Image  Tratare*. 
MIT  Al  Lah  Memo  TR-ftff.  1981. 

[Vistaea  86)  R.  Va.aes:  ‘  Deter  tmg  Structure  ia  Random  Do* 
Patent*:  proc  DARPA  IU  Workshop.  1986 

[Whitney  66)  H  Whuaey:  ’Siagalantm  of  mappings  of 
Eacbdesa  Spaces:  I:  Mappings  -if  the  plnae  mu  the  plane*,  Aaa 

Math  82.  374.  1966 


GC:C1 76/V 5  -Ofi 


I 


MIT  PROGRESS  IN  UNDERSTANDING  IMAGES 


T.  Posgio  and  the  staff 


The  Artificial  Intelligence  Laboratory,  Massachusetts  Institute  of  Technology 


'-'.N 

N'.-t 

V> 

■>.1 


I 


ii 


B 


E 


>•  .* 
wV 


tr 


Our  uorJt  to  date  haj  focused  primarily  on 
the  initio/  processes  and  representations  of  loss  and 
intermediate-level  vision  that  decode  information  about 
the  S-  D  surfaces  and  their  properties.  We  are  now  tum¬ 
my  our  efforts  to  the  integration  of  different  sources  of 
information.  In  this  report  ice  review  the  work  dur¬ 
ing  the  last  year  that  has  preceedcd  this  new  integration 
projert  and  was  still  focused  on  specific  problems. 

In  particular,  we  ail1,  discuss  regularization  theory 
and  some  of  its  new  extensions,  surface  reconstruction, 
the  computation  of  color,  learning  of  rtqulansaion  al¬ 
gorithms,  parallel  multigrid  algorithms,  representation 
of  h-D  shape,  object  recognition,  the  development  of  an 
artificial  eye-head  system,  the  understanding  of  trajec¬ 
tories  and  the  computation  of  spatial  properties. 

1.  Early  Viaion 


The  first  purl  of  vision  from  images  to  surfaces- bus 
been  termed  early  and  intermediate  vision.  Since  at 
least  the  work  of  M.irr  (see  M.trr.  1982)  it  is  widely  as¬ 
sumed  that  it  is  mainly  bottom-up,  lelying  on  general 
knowledge  but  no  special  high  level  information  about 
the  scene  to  be  analysed.  Although  this  point-of-view 
lias  been  embraced  widely  ,  it  is  important  to  observe 
that  its  correct  nos  is  still  to  be  proven.  In  particular, 
it  is  still  unclear  what  tne  nature  of  the  2-1/2  I)  sketch 
represent..  I  inn  is,  how.  different  vi-ual  module;  interact 
.iii.l  t las  ir  out  1 1* i ■  i-  In.  ,d.  wb.it  is  I  In  role  of  It ■;;!■-  ,-vel 
knowledge  on  early  visual  processes.  The  critical  prob¬ 
lem  cf  the  organization  of  vision  and  of  the  control  of 
the  (low  of  i n f< iriiiit t inn  from  ihc  different  modules  and 
high-level  knowledge  is  still  unresolved  We  are  now  be¬ 
ginning  to  approach  this  set  of  problems  by  attempting 
to  integrate  dilTerent  visual  imulnies  to  provide  a  sparse, 
symbolic  and  robust  representation  of  space  around  the 
viewer.  The  main  idea  is  that  the  first  step  in  visa.n  is 
fast,  coarse  and  bottom  up  with  the  goal  of  providing 


a  sparse  and  symbolic  representation  of  3-D  surfaces, 
just  sufficient  for  indexing  a  data  basis  and  for  coarse 
navigation.  This  first  step  must  be  fast  and  robust.  For 
robustness  and  reliability  under  generic  conditions,  it 
exploits  strongly  the  fusion  of  different  low-level  mod¬ 
ules  such  as  slcreo,  motion,  color,  edge  detection  and  la¬ 
beling  of  physical  edges  etc.  The  problem  of  interaction 
of  the  different  modules  is  made  manageable  by  a  coarse 
and  symbolic  description  of  surfaces  and  their  proper¬ 
ties.  Once  coarse  recognition  is  achieved,  the  mode  of 
processing  shifts  to  top-down:  model  verification,  task 
dependent  routines  and  representations  take  over,  fo¬ 
cusing  on  small  parts  of  the  visual  field,  possibly  with 
high  resolution.  We  believe  that  the  rigorous  analysis 
of  individual  modules  of  vision  that  we  have  carried  out 
until  now  will  play  an  important  role  in  any  new  theory 
of  vision. 

The  standard  definition  of  computational  vision  is 
that  it  is  inverse  optics.  The  direct  problcm-the  prob¬ 
lem  of  classical  optics  or  computer  graphics-is  to  de¬ 
termine  the  images  of  three-dimensional  objects.  Com¬ 
putational  vision  is  confronted  with  inverse  problems 
of  recovering  surfaces  from  images.  Much  informa¬ 
tion  is  lost  during  li.e  imaging  process  that  projects 
r.  three-dimensional  world  into  two-dimensional  arrays 
(images).  As  a  consequence,  vision  must  rely  on  natu¬ 
ral  constraints,  that  is,  general  assumptions  about  the 
physical  world  to  derive  an  ambiguous  output.  This  is 
typical  for  many  inverse  problems  in  mathematics  and 
physics. 

As  we  described  in  the  last  repurl,  the  common 
characteristics  of  most  early  vision  problems,  in  a  sense 
their  deep  structure,  can  he  formalized:  early  vision 
problems  are  ill-pnstd  in  the  sense  defined  by  HadamanL 

flertrro,  I’oggio  and  Torre  (1980)  show  precisely 
how  several  problems  listed  in  Table  1  arc  formally  ill- 
posed.  (Sec  also  INrggio  and  Torre,  1981.1  The  recogni¬ 
tion  that  early  vision  problems  are  ill-posed  suggests  im¬ 
mediately  the  use  of  regularization  methods  developed 


•  t 

c 


r 


vV 


i  -  » 

£ 


25 


-N 


A 

t 


,S 


in  mathematics  and  mathematical  physics  (Poggio  and 
Torre,  1984). 

1.1.  Regularization  theory  and  extension* 

The  main  idea  for  “solving"  ill-posed  problems  is  to  re 
strict  the  class  of  admissible  solutions  by  introducing 
suitable  a  priori  knowledge.  In  standard  rcgulariiation 
methods,  due  mainly  to  Tikhonov,  the  regularization  of 
the  ili-posed  problem  of  finding  z  from  the  data  y  : 

Ax  —  y 

requires  the  choice  of  norms  ||- j|  and  of  a  stabilizing 
functional  jJPzl).  In  standard  regularization  theory,  A 
is  a  linear  operator,  the  norms  arc  quadratic  and  P  is 
linear.  A  method  that  can  be  applied  is: 

Find  z  that  minimizes 


!|As-y||J  +  A||Ps||,1  ( 

where  A  is  a  so-called  regularization  parameter. 


In  this  method,  A  controls  the  compromise  between 
the  degree  of  regularization  of  a  solution  and  its  close¬ 
ness  to  the  data  (the  first  term  in  Equation  1).  P  em¬ 
beds  the  physical  constraint!  of  the  problem.  It  can 
fe  shown  for  quadratic  variational  principles  that  un¬ 
der  mild  ct, millions  the  solution  space  is  convex  and  an 
unique  solution  exists. 

Poggie  ct  al.  (1981,  1980)  show  that  several  prob¬ 
lems  in  early  vision  can  be  "solved”  by  standard  reg¬ 
ularization  techniques.  Surface  reconstruction,  optical 
flow  at  each  point  in  the  image,  optical  Sow  along  con¬ 
tours,  color,  stereo  can  be  computed  by  using  standard 
regularization  techniques.  Variational  principles  that 
are  not  exactly  quadratic  but  have  the  same  form  as 
Equation  1  can  be  used  for  other  problems  in  early  vi¬ 
sion.  The  main  results  of  Tikhonov  can,  in  fact,  be 
extended  to  some  cases  in  which  the  operators  A  and  P 
are  nonlinear,  provided  they  satisfy  certain  conditions 
(Moroiov,  1984). 

K  list  of  the  problems  that  can  be  regularized 
by  standard  regularization  theory  or  slightly  non-linear 
versions  of  it  arc  (isted  in  Table  1,  together  with  th* 
associated  regularization  principle. 


TABLE  \ 


Problem 


Edge  detection 


Optical  Flow  (area  based) 


Optical  Flow  (contour  based) 


Surface  Reconstruction 


Spatlotcmpora!  approximation 


Regularization  Principle 

/[(S/-0* +  *(/..)*]* 

/  [(«J»  t-  +A(uJ  +  tij  f  v\  f  dzdy 

/[(V.N-V"),  +  A(£v),]d, 

/[(>'•/  '0*  + *(/?,+ 2 '!%  > 

/[(*’•/  -«)3  +  A (V/-V4  f,yjdzdV 


Shape  from  Shading 


Stereo 


!|r- az||7  +  a!|/>z||j 


/  [(£  -  !{{/.  j))7  +  A(/,3  +  /„3  4  g,*  f  9yJ)  dzdy 

f  J[v-G.(/.(r,y)  -  «(zt  «/(*,*).»))]  i  A(W)3 Jdrdy 


2£ 


I 


I 


I 

4 


V 

V 


I 


E 


t 


-f 


It 


1.1.1.  Limitations  of  Standard  Regularisaiion 
Theory 

This  aew  theoretical  framework  fur  early  vision  show* 
clearly  not  only  the  attractions,  but  also  the  limitation* 
that  are  intrinsic  to  the  standard  Tikhonov  form  of  regu¬ 
larization  theory.  Standard  regularization  methods  lead 
to  satisfactory  solutions  of  early  vision  problems  but 
cannot  deal  effectively  and  directly  with  a  few  general 
problems  such  as  discontinuities  and  /tut on  of  informa¬ 
tion  from  multiple  modules. 

Standard  regularization  theory  with  linear  A  and 
/ 3  is  equivalent  to  restricting  the  space  of  solution  to 
generalized  splines,  whose  order  depend*  on  the  order 
of  the  stabilizer  P.  This  means  that  in  some  case*  the 
solution  v  too  smooth,  and  cannot  be  faithful  in  loca¬ 
tions  where  discontinuities  are  present.  In  optical  How, 
surface  reconstruction  and  stereo,  discontinuities  are  in 
fact  not  only  present,  but  also  the  most  critical  locations 
for  subsequent  visual  information  processing  Standard 
regularitation  cannot  easily  deal  with  another  critical 
problem  of  vision,  the  problem  of  fusing  information 
from  different  early  vision  modules.  Since  the  regular¬ 
izing  principles  of  the  sl.mdaril  theory  are  quadratic, 
they  lead  to  linear  Ruler- Lagrange  equations.  The  out¬ 
put  of  different  modules  can  therefore  be  combined  only 
in  a  linear  way.  Tcrzopoulos  (1984;  see  also  Poqgio  «> 
al.,  1084,  1985)  has  shown  how  standard  regularization 
techniques  can  be  used  in  the  presence  of  discontinuities 
in  the  case  of  surfarc  interpolation.  After  '.landard  reg¬ 
ularization,  locations  where  the  solution  /  originates  a 
large  error  in  the  second  term  of  Kqu'.tion  I,  are  iden¬ 
tified  (this  needs  setting  a  threshold  for  the  error  in 
smoothness).  A  second  regularizat’on  step  is  then  per¬ 
formed  using  the  location  of  discontinuities  as  boundary 
conditions. 

A  similar  method  could  be  used  for  fusing  infor¬ 
mation  from  multiple  sources,  a  regularising  step  conld 
be  performed  and  locations  where  terms  of  the  type  of 
the  first  term  of  Kqu.-'ion  I  give  large  errors  would  be 
identified.  A  decision  step  would  then  follow  by  setting 
appropriately  various  controlling  parameters  in  those  lo¬ 
cations,  therefn.-e  weighting  in  an  appropriate  way  (for 
instance,  vetoing  some  of)  the  various  contributing  pro¬ 
cesses. 

In  any  case,  one  wonlr'  like  more  comprehensive  and 
coherent  theories  capable  or  dealing  directly  with  the 
problem  of  discontinuities  and  the  pmlilcin  of  fusing  in¬ 
formation.  So  the  challenge  for  a  regularization  theory 
of  early  vision  is  to  extend  it  beyond  standard  regu¬ 
larization  methods  and  their  most  obvious  non-linear 
versions.  We  will  describe  two  such  attempts:  the  first 


due  to  Marroqnin,  Milter  and  Poggio  (1985),  is  based 
on  optimal  stochastic  estimation  in  the  form  proposed 
by  Ceman  and  Genian  (1985);  tbe  second,  due  to  Ter- 
xopoulos,  attempts  to  apply  some  of  the  ideas  of  the 
stochastic  approach  to  the  continuous  and  determinis¬ 
tic  framew  t  of  standard  reguiarisalion. 

12.  A  stochastic  approach 

In  the  stochastic  approach,  the  a  priori  knowledge  is 
represented  in  terms  of  an  appropriate  probability  dis- 
t .'bullion  while. is  in  viand. iril  regularization  «  priori 
knowledge  leads  to  restrictions  on  the  solution  spec*. 
This  distribution,  together  with  a  probabilistic  descrip¬ 
tion  of  the  noise  that  corrupts  tbe  observations,  allow* 
one  to  use  Bayes  theory  to  compute  the  posterior  distri¬ 
bution  i’f  i,,  which  represents  the  likelihod  of  a  solution 
/  given  the  observations  q.  In  this  way,  we  can  solve  the 
reconstruction  problem  by  finding  the  estimate  /  which 
either  maximises  this  likelihood  (the  to  called  Maxi- 
ilium  a  Posteriori  or  MAP  estimate),  or  minimisee  the 
expected  value  (with  respect  to  P/  ,)  of  an  aopropri- 
ate  error  function.  The  class  of  snintious  that  can  be 
obtained  in  this  way  is  much  larger  than  in  standard 
regularization.  In  particular,  Marroquin,  Miller  and 
Poggio  (tliis  volume)  show  under  which  conditions  this 
new  method  leads  to  solutions  that  are  of  the  standard 
regularization  type. 

The  price  to  be  paid  for  this  increased  flexibility  in 
computational  complexity.  New  parallel  architectures 
and  possibly  hybrid  computers  of  the  digital-analog  type 
promise  however  to  deal  effectively  with  the  computa¬ 
tional  requirements  of  the  methods  proposed  here.  We 
will  discuss  in  a  later  section  these  new  parallel  archi¬ 
tectures. 

The  main  results  obtained  by  Marroquin  (1985)  are 
(see  Marroquin  et  al.,  this  volume): 

(A)  The  connection  between  error  criteria  and  the 
optional  Bayesian  csl im.atcv.  The  MAP  estimate  is  not 
optimal  for  the  most  natural  error  criteria.  Optimal 
estimators  can  use  Metropolis-type  algorithms,  but  do 
not  need  annealing  schemes. 

(I))  They  derive  algorithms  for  optimal  estimation 
of  surface  reconstruction  ami  for  signal  matching  (for 
instance,  for  binocular  stereo). 

((’)  New  hybrid  par.'lhl  computers  (digital  and 
an  . lo-'J  on  w  liii  h  i uiipli  <1  NIKI  iiimlil-  map  naturally. 


y.t 


V 


27 


i 


\ 


k 


I 


I 


1.7.1.  Learning 

A  critic al  problem  for  the  practical  fc.tsi biJity  of  ‘he 
MRF  approach  to  vision  is  the  estimation  of  the  param¬ 
eters  of  the  MilF  and  noise  models  that  are  required  in 
each  spiciGc  case.  Estimation  from  ihe  noisy  data  alone 
is  in  genera)  a  very  difficult  problem  although  specific 
results  have  beta  ootained  (see  section  1.1).  It  may  be 
however  of  some  interest  to  consider  an  easier  problem, 
that  ol  supervised  leammy  with  noiseless  examples,  in 
the  context  of  MliF  formulations. 

1.3.  Control  led- continuity  constraints  and 
discontinuity  detaction 

Te'iopoulos  (198Gb)  proposes  a  class  of  controiled- 
ccntinuity  constrainU  for  the  regularisation  solution 
of  ill* pose *1  visual  problems  in  an  arbitrary  number 
of  dimensions.  These  generic  constrainU  may  surpass 
conventional  smoothness  constrainU  by  offering  precise 
control  over  spatial  continuity.  This  may  make  it  poe- 
sible  to  optimally  reconstruct,  by  solving  well-poeed 
variational  principles,  not  only  Continuous  regions,  but 
multiple-order  discontinuities  as  well  (global  smooth¬ 
ness  constrainU  are  clearly  inadequate  for  reconstruct¬ 
ing  discontinuities). 

Controlled-continuity  constrainU  we  constructed 
from  generalised  spline  kernels,  which  characterise 
regions,  combined  with  continuity  control  functions 
which  characterise  boundaries.  Unlike  global  smooth¬ 
ness  constraints,  which  lead  to  quadratic  variational 
principles,  controlled-continuity  constrainU  are  formu¬ 
lated  as  nonqundratic  functionals,  in  general.  The 
resulting  variational  principles  characterise  nonlinear, 
spatially  noninvariant  PDEs.  Under  certain  conditions 
these  problems  have  Bayesian  estimation  and  nonlin¬ 
ear  filtering  interpretations  (this  formulation  was  in  fact 
suggested  bv  the  Kayes  approach  of  '’.cin.in  and  Ceinait 
( I1).'- 1)  and  V..iioqiiin  (I’.tnl,  1‘M'ia). 

Tcrropmilos  (198Gb)  observ.v  that  optimal  estima¬ 
tion  of  discontinuities  with  controlled-continuity  con¬ 
straints  poses  a  nonlinear  problem  in  distributed  pa- 
rameler  identification.  The  parameters  to  be  identified 
from  inrmiipirtc  Jala  are  the  continuity  control  func¬ 
tions.  This  too  is  an  dl-posrd  inverse  problem  and  it  is 
most  naturally  regularised,  in  recursive  fashion,  using 
embedded  controlled. continuity  models  of  lower  dimen¬ 
sionality. 

A  particular  controlled-continuity  model,  the  thir. 
plate  surface  unde,  tension,  has  hcen  applied  in  a  frame¬ 
work  for  computing  visible-surface  representations  from 


multiple  sources  of  impoverished  visual  mfonnatiop, 
and  a  multistage,  nonlinear  optimisation  strategy  has 
bceu  developed  for  identifying  and  i cconstructii.g  dis¬ 
continuities  (Teriop jelos,  1985a).  The  strategy  U  de¬ 
terministic,  and  efficient. 

1.4.  Regularising  Color 

Another  problem  that  is  presently  approached  with  reg- 
ulariiatiou  techniques  U  the  computation  of  spectral  re¬ 
flectance  of  surfaces.  An  important  goal  of  computer 
vision  is  to  recover  the  invariant  spectral  reflectance 
properties  of  an  object's  surface  independently  of  the 
ilium  inant.  Poggio  and  Iluribert  (unpublished)  we  de¬ 
veloping  t  class  of  regularisatioo  algorithms  for  "solv¬ 
ing”  Inis  problem,  that  we  now  describe  in  some  detad 
(since  it  will  remain  otherwise  unpublished). 

The  data  measured  by  the  sensors  art: 

5*lx)  -  J  a‘‘(X)R(X,i)E(X,x)dX,  (2) 

where  if  labels  the  spectral  type  of  sensors  (if  = 
1,  ...3  for  R-O-B),  a“(A)  is  the  spectral  sensitivity  at  the 
i/th-type  sensor  (or  filter)  and  S^fx)  its  output.  R(X,x) 
is  the  a*r/.i«  reflectance  and  C(  A,  l)  the  effective  if/u- 
minotinn  intensity  that  Lakes  into  account  .Tl)  shading 
.-Hi  cl  -  ami  pi-s-ibly  sham-ws  (Iluribert,  I9tt5). 

Equation  (2)  shows  that  ft  and  E  cannot  be  deter¬ 
mined  from  5  and  a  uniquely  without  additional  con¬ 
straints.  (n  any  case,  there  is  clearly  a  factor  -  the  rel¬ 
ative  scaling  of  E  and  ft  -  that  cannot  be  determined 
unless  the  illuminant  is  observed  directly. 

Regulcxisation  algorithms  may  exploit  several  con¬ 
straints  that  are  usually  true  far  the  albedo  and  the  re- 
Qcctnncc  The  main  constraints  listed  by  Iluribert  and 
Poggio  are  (see  also  Putin  and  Richards,  1984): 

1)  Typir.J  illuininants  and  typical  albedos  havea  fi¬ 
nite  (and  small)  number  of  drgrccs  of  freedom.  In  other 
words,  they  ran  be  described  as  linear  combinations  of  a 
fixed  set  of  basis  functions  (notice  that  in  different  situ¬ 
ations  such  as  different  lighting  conditions  different 
basis  muy  lie  required  uid  (hat  the  illuminant  basis  is 
dilferent,  in  general,  from  the  albedo  basis).  Thin  as¬ 
sumption  -  called  the  i pecral  reclamation  assumption 
has  licrn  staled  by  several  authors  (for  a  recent  review 
sec  Maloney,  1985). 

2)  Typically,  E(z),  tha  “effective  illumination”, 
changes  with  x  more  slowly  than  /f(x,A)  (apart  from 
eases  when  surface  changes  abruptly  or  sharp  shad¬ 
ows  arc  picscnt);  /{(/.,  A)  is  either  constant  or  changes 


28 


sharp iv  (at  material  edges).  This  is  the  spatial  regular- 
izalior.  assumption,  essentially  due  to  E.  Land  (see  alto 
Horn,  1974) 

3)  Usually,  the  efTcctive  illuniinatii.u  Lias  the  saint 
dependence  on  x  at  each  X: 

L(z,X)  ■=  E(x)K(X)  (31 

This  is  the  single  touree  assumption  u.cuuse  Equa¬ 
tion  (3)  is  satisfied  when  there  is  a  single,  homogeneous 
light  source,  even  in  the  presence  of  shading.  Shad¬ 
ows  may  be  a  problem  in  some  cw.i,  because  of  srlf- 
illiiii.in.il ion  cfTecls  (sec  llubin  and  Richards,  PPM). 

Two  additional  assumptions  that  we  are  not  using 
directly  but  may  be  easily  incorporated  in  oui  frame¬ 
work  are: 

4)  In  many  scenes,  it  may  be  possible  to  assume 
that  the  average  surface  reflectance  in  each  wavelength 
baud  is  'grey’. 

5)  For  many  naturally  occurring  reflectances  (ne¬ 
glecting  specular  reflections  and  highlights)  the  ratio  of 
the  largest  value  to  the  smallest  is  no  larger  than  about 
10  (Lcltvin  ). 

Assumption  1  allows  to  rewrite  Equation  2  as 

S-(x)  --  7TJ«,(x)rJ(*)  W 

whrre  the  tensor  T  is  defined  ns 

7 *  -  f  dXa*(X)p’(XW(X) 

where  o  ~  I....4  and  t, J  =  1...-JV,  the  p's  and 
the  q'%  are  the  basis  functions  Lr  the  illuminaat  and  for 
the  albedo,  respectively,  and  summation  ovei  repeated 
indices  is  tacitly  assumed. 

Assumption  2  means  that  e'(x)  will  change  slowly 
(apart  from  surface  discontinuities)  and  r’(x )  sill  U 
cither  constant  or  change  sharply. 

Assumption  3  means  that  e’(x)  =  e'g(z',. 

Thus  the  ratio  between  the  inlrnrity  .''’’(l)  mea¬ 
sured  in  one  spectral  channel  and  the  intensity  mea¬ 
sured  in  .mother  should  change  ill  a  noiseless  situation 
only  at  albedo  boundaries,  (where  the  albedo  changes), 
and  should  be  invariant  for  changes  in  the  clfcclivc  illu¬ 
mination. 

Poggio  and  llurlbert  arc  using  the  probabilistic 


tools  introduced  by  Ceman  and  Genian  (I.  and  de¬ 
velop'd  flirt  tier  by  Marroquiu  (  see  M.vroqniii  c*  al., 
lli:.>  V'liunie)  <liiil..p  .ilgm  it  lulls  that  ran  exploit 

these  constraints.  The  basic  idea  is  to  define  Markov 
Random  Fields  associated  to  albedo  and  effective  illu¬ 
mination  with  a  probabilistic  structure  reflecting  the 
relevant  assumption  outlined  earlier. 

A  (vector)  MHF  can  be  defined,  corresj.  ordiug  to 
the  il'uiuirint  component  aud  taking  continuous  val¬ 
ues  c(z)  cs  the  lattice.  Another  MUF  corresponds  to 
the  albedo  rrf_)  and  is  also  continuous.  The  a  priori 
probability  uist-ibutionr  are  Gibbs  distributions  with 
poUD'-ialt  0(e)  and  U(r).  -.’i  choose  these  to  satisfy 

Assumption  2  For  instance 

(’i(e)  =  U (e, , e} )  (e,  -  e} )*  forit  -  >|  =  1 

-  0  otherwise 

f'2(r)  -  V(r,,r,)  =  0(|r",  -  r, )3)  for|t  -  j\  ~  1 

=  0  otherwise 

where  3(i)  is  a  symmetric  function  that  is  0  for 
s  =  0,  is  large  for  small  v.Jue*  of  |x|,  and  may  go  to 
0  again  for  large  values  of  ;*|  (a  standard  Ising  model 
may  suffice).  In  this  way  slow  changes  of  f  are  penal¬ 
ised,  whereat  either  constraint  values  or  sharp  changes 
carry  so  cost. 

For  the  observation  we  assume  that  5(z)  corre¬ 
spond  to  arm i pies  of  the  image  aiTetted  by  additive  gano- 
tian  noise  giving  a  potential  term  in  the  s  posteriori 
distribution  of  the  type 

fr«  =  -  TU'fO1 

We  can  exploit  assumption  (3)  by  defining  two  ad 
dilional  MRF,  one  corresponding  to  the  albedo  discon¬ 
tinuities  and  the  other  to  effective  illumination  disconti¬ 
nuities  (due  for  instance  to  discontinuities  in  the  surface 
3  or  n  its  m.-mial). 

For  this  we  consider  the  boundary  indicator  f)(z)  - 
(or  .  s.-uioe  here  for  .simplicity  v  1,2). 

Then  we  define  the  (observable)  iscginentation  index  d, 
as  d,  T  iHi)' ,  where 

T(x)  0  ifx;  <  threshold 
I  otherwise 

We  ran  also  define  an  (observable)  6,  a*  6,  — 
7'  d/dxJ2 /"(r)!-  We  then  define  the  intensify  Ixnindaij 
index  c,  A,  ( I  «,),  which  marks  sharp  changes  of  in- 


29 


tensity  likely  to  cc. respond  to  sharp  changes  in  effective 
illumination  but  not  change*  in  albedo  (the  threshold  is 
set  on  the  basis  c, noise  estimatesl. 

We  are  now  ready  to  introduce  our  third  MRF  - 
the  segmentation  process  -  that  should  correspond  to 
albedo  boundaries,  its  observation  model  provide,  a 
potential 

(/3(u,s)  =  0^2  {l  -  -<*•)) 

where  6  is  the  delta  function. 

A  similar  potential  describes  our  fourth  MRF  -  the 
line  process  (marking  effective  illumination  boundaries) 


We  can  compute  these  estimates  by  using  Mon- 
tecario  methods  of  t^e  type  discussed  by  M-urroquin 
(l‘J85a,p.  132,  19S5;  See  also  Marroquin  et  al.,  this  vol¬ 
ume). 

Under  some  special  conditions,  the  optimal  esti¬ 
mation  of  r, l  reduces  to  the  standard  quadratic  vari¬ 
ational  principle,  of  the  type  of  equation  (1),  devcl- 
eped  by  Hurlbcrt  and  Poggic  (see  Poggio,  Torre  and 
Koch,  1985;  llurlbert,  1985)  for  the  computat>-u  of 
color.  First  of  all,  we  assume  that  there  is  no  line  or 
segmentation  processes;  furthermore,  the  p  and  q  are 
chosen  so  that  T*  -  implying  that  Up  has  the 

form  f/()  =  -  (r*  +  :X')]J.  Finally,  the  potential  U 

is  assumed  to  have  the  form  (for  conliauoos  x) 


U4(d,l)=:a£;(l-6(t.-c,)) 

We  now  couple  all  these  MRF’s  together.  The  line 
process  is  coupled  to  the  effective  illumination  by  sub¬ 
stituting  U i(e)  with 


where  G  is  a  gaussian  filler  with  standard  deviation 


-*,)’(»  -W  f°r!<  -  il  =  1 

=  0  otherwise 

The  segmentation  process  is  coupled  to  the  albedo 
process  by 

f';(r,s)  -  \Au  ~  Fy)J(l  ■  ,„)  for|»  -"/|  ~  1 
=  0  otherwise 

where  /3(z)  =  0  if  (r,  -  rt)  =  0  and  otherwise  is 
large  (Ising  model).  We  may  also  add  a  cost  associated 
with  the  fact  that  both  effective  illumination  edges  and 
albedo  boundaries  are  continuous  contours  and  there 
cannot  he  too  many  of  them  (see  Marroouin,  1985a,  p, 
129,  Fig.  11).  The  total  potcnt.al  corresponding  to  the 
a  posteriori  Cibbs  distribution  is  then 

U,,(e,  r,t,  /, »)  - 

-  U\\  4  0\  t  U  2  4  l J  1  f  Us 

As  a  performance  criterion  we  will  use  a  mixed  cri¬ 
terion,  where  e'  and  r  should  he  as  close  .as  possible  to 
their  true  values  and  we  should  make  as  few  errors  as 
possible  in  the  .assertion  about  the  presence  or  abscerce 
of  boundaries  l  and  s.  Tims  the  optimal  estimates  are 
the  posterior  mean  for  e,"(x)  anil  r*'{x)  and  the  maxi- 
miser  of  the  a  posteriori  marginals  for  l,  and  s,  (Mar- 
roqni.i  et  a).,  this  volume). 


If  we  choose  as  performance  criterion  a  functional 
that  penalizes  very  much  any  single  error,  the  optimal 
extimate  cf  r  and  c  turns  out  to  be  the  MAP  estimate. 
This  corresponds  ir  turn  to  r  and  {  that  minimises  tbs 
quadratic  potential  function  f/(r,e).  The  problem  ia 
then  of  'he  standa,  ’  regul.iritatiun  type.  Minimisation 
of  U  is  equivalent  to  filtering  V (x)  (assuming  contin¬ 
uous  data)  th'ough  linear  fillers  to  obtain  R“  and  E 
For  instance, 


_ ‘ _ lw(u) 

(1  rAu'c-*1*1  f  Aw'jfl -1- Aa»J)  4- 1  1 

.  Although  the  last  standard  regiilarisation  algorithm 
is  computationally  simple,  this  is  not  so  for  the  full 
stochastic  model  that  consists  of  no  less  than  four 
MftFs!  It  is  likely  however  that  the  full  MRF  model 
could  be  used  to  refine  a  rough  solution  found  (easily) 
in  terms  of  the  intensity  boundary  nml  the  segmenta¬ 
tion  indec.cs  (and  pu.sibly  the  standard  regularization 
filtering). 

1.5.  Color  computation  without  regu'nrizatlon 

A  noii-regul.irization  algorithm  for  computing  spectral 
reflectances  lias  b.’en  studied  by  Yinllc  (1981).  A  simi¬ 
lar  theory  was  developed  earlier  by  and  independently 
Wandell  and  Maloney  (1981:  see  M.ilmicy,  1985).  The 


30 


method  for  recovering  the  surface  reflect ance  of  an  ob¬ 
ject  when  the  incident  illumination  is  unknown  is  based 
on  having  enough  different  special  type  of  sensors.  The 
theory  assumes  that,  for  most  objects  viewed  under  nor¬ 
mal  lig’.htirg  renditions,  the  d  In  initiation  and  surface  rc- 
lirctaiirr  can  lie  expanded  in  terms  of  a  Imiie  number 
of  basis  functicns  (six  Enuation  -t).  If  we  restrict  our¬ 
selves  tu  Lind's  Mondrian  world,  which  consists  of  Bat 
rectangular  patches  or  different  colors,  each  patch  will 
yield  a  number  of  non-linear  equations  for  the  illumina¬ 
tion  and  rcUcctances  of  each  patch.  As  the  boundary 
between  two  adjacent  patches  is  arbitrarily  thin,  the  il¬ 
lumination  will  he  the  same  on  cither  side  of  it  and  it  il 
possible  to  show  that,  given  enough  type*  of  photorecep¬ 
tors  in  the  eye,  the  equations  for  adjacent  patches  can 
be  combined  to  solve  for  the  illumination  (which  does 
not  need  to  be  constant  over  the  whole  scene)  and  the 
reflectance  functions  of  the  two  patches  up  to  an  over¬ 
all  scaling  factor.  If  there  are  three  basis  functions  for 
the  illumination  anil  reflectance  then  four  type*  of  sen¬ 
sors  are  needed.  The  method  may  be  extended  to  deal 
with  general  objects  by  dcGning  an  cdgc-detection-like 
operation  which  detect*  boundaries  between  region*  of 
different  color  and  which  then  determines  the  color  ts 
in  the  Mondrian  case.  The  color  thus  determined  on  the 
boundaries  of  objects  can  then  be  propagated  inward*. 

1.0.  Learning  a  regularization  algorithm 


If  we  want  to  develop  a  powerful  and  flexible  vision  *y»- 
tent  we  need  ideally  to  endow  il  with  the  capability  of 
improving  its  algorithms  from  experience  and  learning 
new  olios  from  examples.  Il  is  in  (act  possible  to  learn 
some  standard  irgiilarixiug  algorithms  from  examples 
with-att  having  to  formulate  explicitely  the  variational 
principle  that  embeds  the  relevant  physical  constraint* 
(Ilurlbcrt  and  i’oggio,  HUM).  This  is  especially  impor¬ 
tant  because  the  exact  form  of  toe  relevant  constraint* 
deprnd  on  the  specific  type  of  situation.  For  instance, 
the  regularizing  constraints  on  the  spectral  properties 
of  illumination  for  solving  tne  problem  of  computing 
surface  relief  I  aiires  is  somewhat  different  for  outdoor* 
v- .  indoor  -it  nat  mils.  ‘I  lie  r.ip.ilubiy  to  learn  ihc  ex- 
.irl  fit i til  ill  the  'n  ..ill  iii'm, .  algmillim  imlll  example* 
coul  I  provide  all  efficient  way  of  approaching  this  prob¬ 
lem 

Miiiimir.il  mu  of  the  regularization  principle  Filia¬ 
tion  I  corresponds  to  a  ni/u/iirirmy  operator,  i  e.  a  lys- 
t<in.  acting  on  the  input  data  -j  and  providing,  a*  an 
output,  the  regularized  solution  r.  We  want  to  show 


that  this  regularizing  operator  can  be  synlhclized  by  as¬ 
sociative  learning  from  a  set  of  examples.  Our  argument 
consists  of  two  simple  claims.  The  first  claim  is  that  the 
regularizing  operator  corresponding  to  quadratic  varia¬ 
tional  principles  is  linear.  The  second  one  is  that  any 
linear  mapping  between  two  vector  input  spaces  can  be 
symliclizcd  L>y  an  associative  scheme  based  on  the  com¬ 
pulation  of  the  pseudninverse  of  the  data.  Wc  explain 
now  in  more  detail  our  argument. 

(a)  Varial-onal  principles  of  the  form  of  expression* 
(I)  provide  a  regularised  solution  as  a  linear  transfor¬ 
mation  of  the  data.  The  reason  for  this  in  'hat  the 
Euler- Ixigrange  equation*  are  linear  partial  differential 
equation*  and  therefore  define  die  solution  *  a*  a  linear 
functional  of  the  data  y  (depending  on  the  boundary 
condition*).  For  simplicity,  we  will  restrict  ourscivea  in 
this  paper  to  the  discrete  case  of  Equation  1,  in  which 
z  and  y  are  vecton.  and  A  and  the  Tikhonov  stabiliser 
P  are  matrices  and  A  does  not  depend  ou  the  data. 
Equation  1  becomes  then 

ilAr-vf  +  AHPill*.  (5) 

where  ||  ■  ||  is  a  norm  .  The  minimum  of  this  func¬ 
tional  will  occur  at  its  unique  stationary  point  z.  Setting 
to  zero  the  gradient  cf  the  functional  of  Equation  5  give* 
thi  omnium  vector  z  a*  the  solution  of  the  system*  if 
linear  equation* 

(AtA+  \PtP)z  =  ATt  (6) 

It  follows  that  the  solution  z  can  he  written  u 

z  Sy  (7) 

and  L.  therefore  a  linear  transformation  on  the  data 
vector  y.  It  is  imp>orlant  tu  notice  that  the  linear  oper¬ 
ator  (when  it  is  not  spare  invariant,  i.e.  a  c„  <.'o!ution 
operator)  may  depend  in  general  on  the  given  .attice  of 
data  points. 

(b)  Imagine  now  that  a  set  of  noiseless  input  vectors 
yarc  available  together  with  the  corresponding  regular¬ 
ized  solutions  z.  Arrange  these  vectors  in  two  matrices 
)'  and  Z.  The  prnhlrm  of  syntlietizing  the  regularis¬ 
ing  operator  .V  that  provides  the  regularized  solution  * 
for  e.irh  vector  y"  is  then  rmnvalent  to  "  solving"  the 
following  equation 

/  --  SY  (8) 

and  finding  the  matrix  5.  A  general  solution  to 
this  pr.ddcm,  that  is  optimal  in  the  least-squares  sense, 
is  provided  by 


31 


S  =  ZY  + 


(9) 

where  Y+  is  the  pscndoinvcrse  of  Y .  This  is  the 
solution  which  is  most  robust  against  errors,  if  equation 
9  admits  several  solutions  and  it  is  the  optimal  solution 
in  the  least-squares  sense,  if  no  exact  solution  of  equa¬ 
tion  9  exists.  It  is  of  particular  interest  for  practical 
applications  that  the  pseudoinverse  can  be  computed  in 
an  adaptive  way  by  updating  it  when  new  data  become 
available.  Piactically  the  numeric?!  computation  of  the 
pscudoinversc  may  be  ill-conditioned  and  i3  therefore 
advisable  to  regularize  it  by  using  Tikhonov’s  method 
(Tikhonov  and  Arsenin,  1977).  We  had  good  results  in 
the  “regularized’’  implementation  of  the  learning  of  the 
color  algorithm. 

Equation  9  shows  that  the  standard  regularizing 
operator  5’  (parametrized  by  the  lattice  of  data  points) 
ran  be  syathetized  without  need  of  an  explicit  varia¬ 
tional  principle,  if  a  sufficient  set  of  correct  (in  the  reg¬ 
ularization  sense)  input  output  data  pairs  is  available 
to  the  system. 

We  plan  to  study  the  extension  of  this  linear  icarn- 
ing  scheme  to  iionqiiadratic  regularization  principles  (of 
the  type  described  earlier).  An  obvious  scheme  simply 
involves  finding  the  nonlinear  oprrator  that  minimizes 
an  appropriate  “distance"  between  the  data  and  the  zo- 
lution  set. 

1.7.  Parallel  algorithms:  Concurrent 
multigricl  coordination 

Terzopoulos  (1983,  1980a)  has  shown  that  multilevel 
relaxation  methods  are  an  effective  tool  for  design¬ 
ing  highly  eflicent  optimization  algorithms  for  early  vi¬ 
sion.  Algorithms  of  this  type  have  been  developed  for 
computing  useful  nmltiscale  regularization  solutions  to 
problems  in  image  analysis  and  in  the  computation  of 
3D  surfaces  from  images.  The  algorithms  are  amenable 
to  implementation  on  Gne-grained,  massively  parallel 
hardware,  such  ns  the  Connection  Machine. 

In  multilevel  algorithms,  the  primary  relaxation  op¬ 
erations  on  each  of  the  levels  must  be  coordinated  con¬ 
sistent  with  optimizing  the  given  objective  functional. 
Originally,  we  employed  a  standard,  recursive  multi¬ 
level  coordination  stratrgy  which  proves  to  be  very  ef¬ 
fective  on  sequential  computers.  However,  the  recursive 
strategy  activates  only  a  single  level  at  any  given  time. 
Hence,  it  makes  rather  poor  use  of  available  processors 
in  a  highly  parallel  implementation. 


nation  schemes,  the  new  coordination  strategy  is  fully 
concurrent;  it  maintains  processors  on  all  the  levels  busy 
performing  simultaneous  relaxation  operations. 

The  concurrent  coordination  strategy  then  aims  to 
optimize  a  multilevel  energy  functional  consisting  of  the 
sum  ef  three  terms:  (l)  a  summation  of  the  discrete 
form  of  the  given  functional  on  each  level  of  a  lmiltigrid 
hierarchy,  (.’)  a  summation  of  functionals  coupling  each 
level  (except  the  Dncst)  to  the  next  liner  level,  and  (3)  a 
summation  of  functionals  coupling  each  level  (except  the 
coarsest)  to  the  next  coarser  level.  The  interlevcl  cou¬ 
pling  functionals  arc  designed  so  that  the  scheme  will  be 
convergent.  Each  involves  a  parameter  The  coupling 
parameters  are  modified  during  the  iterative  process 
such  that  there  is  an  initially  strong  but  gradually  weak¬ 
ening  roarse-to-Gue  interaction,  which  acccleratei  con¬ 
vergence,  and  an  initially  weak  but  gradually  strength¬ 
ening  Dne-to-cr,arse  interaction,  which  yields  consistent 
accuracy  on  ail  levels. 

In  addition  to  making  full  use  of  ail  available  pro¬ 
cessing  elements,  the  concurrent  strategy  is  significantly 
easier  to  implement  than  the  recursive  schemes,  not  only 
on  parallel,  but  even  on  conventional  computers. 


1.8.  Parallel  hardware  lor  regularization 


Our  discussion  zuggestz  a  classification  of  vision  algo¬ 
rithms  that  maps  naturally  into  parallel  digital  com¬ 
puter  architectures  that  are  now  under  development. 
Standard  regularization,  when  sufficient,  leads  to  two 
classes  of  parallel  algorithms.  Algorithms  fur  finding 
minima  of  a  ronvex  functional  such  as  steepest  descent 
or  the  more  efficient  multigrid  algorithms  developed  for 
vision  (see  previous  section)  can  always  be  used.  They 
can  be  replaced  by  convolution  algorithms  if  the  data 
are  given  on  a  regular  grid  and  the  operator  A  in  Equa¬ 
tion  3  is  space-invariant.  In  the  later  case,  the  regular¬ 
ized  solution  is  obtained  by  convolving  tire  data  through 
a  precomputed  filter.  The  MUK  approach  leads  to  al¬ 
gorithms  either  ef  the  Metropolis  type  or  specific  for 
the  problem  (Marroquin,  1985).  All  these  algorithms 
may  be  implemented  by  parallel  architectures  of  many 
processors  with  only  local  connections  and  by  hybrid 
computer  architect nrcs  (Marroquin  et  al.,  this  volume, 
Poggio,  Torre  and  Kocli,  1985;  Knrli,  Marrmpiin  and 
5  mile,  l*)So). 


Terzopoulos  (1985b)  has  developed  a  new  multi¬ 
level  coordination  strategy  that  exploits  a  greater  de¬ 
gree  of  parallelism.  In  couti.ast  to  the  recursive  coordi¬ 


32 


2.  Shape  representation 

Brady  and  his  associates  continued  to  develop  a 
new  representation  of  two-dimensional  shapes  called 
smoothed  local  symmetries  (Brady  and  Asada,  1984). 
This  work,  described  in  previous  reports,  represents 
both  the  bounding  contour  of  a  shape  'Yaginent  and 
the  region  that  it  subtends  or  encloses.  1;  .nrolvcs  con- 
tsructing  a  representation,  the  curvature  primal  sketch 
(Asada  and  Brady,  198-1),  of  the  rignificant  changes  in 
curvature  along  the  contours  of  the  shape.  The  work  de¬ 
scribed  here  extends  the  representation  to  a  larger  class 
of  shapes,  including  surfaces,  anJ  shows  how  to  gener¬ 
ate  complex  semantic  network  descriptions  of  objects, 
which  can  then  be  learnt.  The  representation  has  been 
successfully  used  in  applications  in  vision  and  robotics. 
In  work  reported  elsewhere  Anita  Flynn  (Flynn,  1985) 
has  successfully  adapted  the  curvature  primal  sketch  de¬ 
scription  for  robot  navigation  involving  multiple  sen¬ 
sors. 

2.1.  Local  Rotational  Symmetric* 

Smoothed  Local  Symmetries  provide  stable  and  per¬ 
ceptually  appropriate  representations  only  for  regions 
that  are  more  or  less  elongated.  Flrrk  (1985)  has  de¬ 
veloped  a  companion  representation,  called  Local  Rota¬ 
tional  Symmetries,  for  round  regions,  including  irregu¬ 
larly  circular  or  ovrl  regions,  round  bumps  .end  ends  of 
elongated  regions,  uexagons,  spirals,  and  round  regions 
broken  up  by  attachment  or  occlusion.  Like  some  im¬ 
plementations  of  Smoothed  L  'cal  Symmetries,  her  im¬ 
plementation  of  Local  Rotational  Symmetries  computes 
representations  of  grey-scale  image  at  multiple  resolu¬ 
tions.  A  new  feature  of  this  implementation  is  that  it 
allows  the  regions  found  at  one  resolution  to  guide  anal¬ 
ysis  of  the  image  at  Brer  resolutions.  Thus,  exhaustive 
computation  can  be  done  only  locally  at  each  resolu¬ 
tion,  which  makes  (he  coiuput.il ion  more  efficient  „ml 
suppresses  computation  of  certain  symmetries  whica 
arc  not  perceptually  salient.  Further,  to  create  repre¬ 
sentations  at  multiple  resolutions,  this  implementation 
smooths  the  grey-scale  image  before  extracting  region 
boundaries,  ratlirr  than  smoothing  region  boundaries. 
This  type  of  smoothing  proves  to  be  more  robust  on 
real  input  images.  A  companion  re-implementation  of 
■smoothed  Local  Symmetries  is  being  developed. 

2.2.  Towards  a  Surface  Primal  Sketch 

Ponce  and  Brady  (Ponce  anil  Hrady,  1985)  continued 
previous  invest igat ion  of  surface  descriptions  based  on 


concepts  of  differential  geometry  (Hrady,  Ponce,  Yuille, 
and  Asada,  1985).  They  iinplcnicd  the  surface  primal 
sketch,  a  representaion  of  significant  surface  changes 
in  dense  depth  maps  analogous  to  Marr’s  (Marr,  1976) 
primal  sketch  representation  of  image  intensity  changes. 
The  implemented  program  detects,  localizes  and  sym¬ 
bolically  describes:  (l)slcps,  where  the  surface  height 
function  is  discontinuous,  (2)  roofs ,  where  th:  surface 
normal  is  discontinuous,  (3)  smooth  joins,  where  a  prin 
cipal  curvature  is  discontinuous  and  (4)  shoulders  con¬ 
sisting  of  two  roofs  and  a  step  viewed  obliquely.  The 
program  performs  well  on  rauge  maps,  generated  by 
laser  data,  of  objects  of  varying  complexity. 

2.3.  Learning  Shape  Descriptions 

Brady  and  Connell  have  recently  combined  the  work 
on  the  Smoothed  Local  Symmetries  shape  representa¬ 
tion  (Brady  and  Asada  1981)  with  a  modiGed  form  of 
Winston’s  learning  system  (Winston,  1981,  Winston, 
1982).  The  resulting  program  generates  complex  seman¬ 
tic  network  descriptions  of  objects  directly  from  images  • 
(Connell  and  Brady,  1985a,  Connell  and  Brady,  1985b, 
Connell.  1985).  Furthermore,  from  a  series  of  exam¬ 
ples  the  system  produces  a  model  of  the  objects  it  has 
sren  by  generalizing  their  descriptions.  These  models 
arc  l  lien  used  to  recognize  other  members  of  the  class. 

I 'sing  this  system,  the  relation  between  an  object's  form 
and  its  function  lias  been  briefly  explored  (Brady,  Agre, 
Brauncgg,  and  Connell,  1984).  An  important  focus  of 
the  shape  representation  work  has  been  to  determine 
methods  for  segmenting  an  imaged  object  into  parts 
which  can  be  described  in  the  semantic  network  formal¬ 
ism.  This  decompositon  is  guided  by  the  principles  of 
smooth  continuation  and  compactness  of  regions.  An¬ 
other  focus  li.as  been  to  identify  various  types  of  struc¬ 
tural  approximation  and  ways  to  detect  them.  As  sug¬ 
gested  by  Marr  (Marr  and  Nishiliara  1978)  such  a  hi¬ 
erarchical  structuring  is  useful  in  matching  objects  to 
class,  models.  Four  forms  of  structural  abstraction  have 
been  found  to  be  particularly  useful  and  have  been  in¬ 
corporated  into  the  learning  and  matching  algorithms 
used  by  the  system. 


3.  Object  recognition 

Crimson  and  Lozano- I’erez  have  continued  their  work 
on  object  recognition  from  sparse,  noise  sensory  data. 


33 


WL  »" 


:.  tr.c  jt-j  ^  rf'j  sr-  X'J  w~j 


■  ■; U  X VAUI .»M 


Previously,  we  reported  on  a  technique  for  recognizing 
occluded  objects,  which  assumes  polyhedral  models  of 
the  objects  of  interest,  and  simple  measurements  of  the 
position  and  surface  orientation  of  small  patches  of  sur¬ 
face.  The  technique  searclis  for  consistent  matchings 
between  the  faces  of  the  object  models  and  the  sensory 
measurements,  using  simple  geometric  constraints  in  a 
standard  backtracking  tree  search. 

In  the  past  year  we  have  extended  our  work  in  sev¬ 
eral  ways.  First,  we  have  performed  theoretical  analy¬ 
sis  on  the  expected  combinatorial  efficiency  of  the  tech¬ 
nique,  and  have  shown  that  the  results  of  extensive  sim¬ 
ulations  of  the  process  arc  consistent  with  expected  the¬ 
oretical  bounds.  Second,  we  have  tested  the  technique 
on  several  different  types  of  real  data,  including  sonar, 
laser,  tactile  and  visual  data. 

Third,  wc  have  investigated  alternate  types  on  sim¬ 
ple  geometric  constraints.  In  particular,  the  original 
technique  used  a  set  of  decoupled  constraints,  by  con¬ 
sidering  independently  (1)  distances  between  faces,  (2) 
angles  between  face  normals,  and  (3)  components  of  vec¬ 
tors  between  faces  in  the  direction  of  ‘he  face  normals. 
While  these  arc  simple  constraints  to  implement,  and 
are  remarkably  effective  at  reducing  the  space  of  possi¬ 
ble  solutions,  they  do  not  completely  solve  the  problem, 
since  they  only  apply  to  pairs  of  faces,  and  not  to  the 
interpretation  as  a  whole.  We  have  consider  coupled 
constraints  of  the  same  form  as  an  alternative.  Here, 
rather  than  simply  testing  whether  the  assignment  of 
two  data  points  to  a  pair  of  object  faces  is  consistent 
with  the  mcasurir.ents,  we  actually  compute  the  range 
of  possible  positions  along  the  face  that  the  point  could 
take.  These  ranges  are  propagated  as  additional  points 
arc  added  to  the  interpretation,  so  that  each  new  con¬ 
straint  tends  to  reduce  the  range  of  possible  positions. 
This  continues  until  cither  there  is  not  feasible  range  of 
positions,  or  until  all  the  data  arc  accounted  for.  Ex¬ 
periments  with  these  coupled  constraints  indicates  that 
while  the  portion  of  the  search  space  which  must  be  ex¬ 
plored  is  reduced,  the  additional  computational  cost  of 
performing  that  search  tends  to  outweiglit  the  advan¬ 
tages  of  the  reduced  search. 

Fourth,  wc  have  investigated  additional  strategies 
for  reducing  Hie  amount  of  search  required  to  find  the 
interpretation,  in  particular,  wc  have  added  configura¬ 
tion  hashing  techniques  as  a  method  for  rapidly  select¬ 
ing  small  portions  of  the  search  space  that  are  likely 
to  lead  to  consistent  interpretations.  There  techniques 
apply  to  botii  two  dimensional  and  three-dimensional 
problems,  and  significantly  improve  the  performance  of 
the  algorithm,  without  l°s3  of  accuracy. 


Fifth,  we  have  investigated  techniques  for  autniat- 
irally  sdec.ing  adduinn.il  places  for  obtaining  sensory 
data.  Since  wc  are  using  only  sparse,  noisy  data,  it  is 
frequently  the  case  that  more  than  one  interpretation 
is  consistent  with  that  data.  To  completely  solve  thn 
recognition  and  localisation  piolilcin,  wc  need  to  acquire 
additional  sensory  data,  until  only  one  interpretation  is 
feasible.  While  random  acquisition  processes  will  even¬ 
tually  converge  to  a  unique  interpretation,  wc  ha  -e  also 
considered  techniques  that  will  optimally  select  addi¬ 
tional  sensing  positions  for  disambiguating  multiple  in¬ 
terpretations.  These  techniques  ha"e  been  implemented 
and  succssfully  tested. 


4.  Towards  visuo-motor  coordination 

4.1.  An  eyo-head  system 

A  robot  in  a  complex  visual  environment,  where  ob¬ 
jects  and  the  robot  axe  allowed  to  move  relative  to  each 
other  in  three  dimensions,  mav  require  a  sophisticated 
system  for  relocating  the  lines  of  sight  of  its  binocular 
vision  system.  The  requirements  for  such  a  system  may 
be  not  unlike  those  for  the  primate  oculomotor  system: 
(l)  locate  and  fixate  objects  of  interest, 

(2)  stabilize  the  images  of  such  objects  despite  ob¬ 
ject  or  self  movement,  and  (3)  reduce  the  bandwidth 
of  visual  information  by  the  use  of  a  small,  high-density 
photoreceptor  array  which  requires  sequential  relocation 
to  different  spots  in  a  wide  field  of  view.  The  ability  to 
relocate  the  lines  of  sight  may,  by  offering  several  related 
views,  provide  simplifying  constraints  on  visual  compu¬ 
tations  that  arc  otherwise  ambiguous  when  performed 
on  a  single  static  image.  To  study  these  issues  wc  have 
designed  and  built  and  eye-head  sstein.  The  system  will 
be  the  input  to  our  ‘Vision  machine”,  which  we  are  now 
developing.  Tn  particular,  it  will  allow  hihg-levcl  pro¬ 
cesses  to  direct  gaze  anil  attention  to  specific  parts  of 
tiie  3  I)  scene. 

The  MIT  eye-head  robot  consists  of  a  platform 
upon  which  are  mounted  two  Hitachi  solid  state  cameras 
(250  by  320  pixel  array)  and  four  rotatable  mirrors,  two 
for  each  camera,  i'lic  platform  has  two  axes  of  rotation 
(shown  in  Figure  1).  Stepping  motors  act  along  these 
axes  to  change  the  pan  and  pitch  angles  of  tiio  plat¬ 
form  on  which  the  cameras  are  mounted,  in  front  of 
each  camera  arc  two  mirrors,  one  immediately  in  f-ont, 
called  the  inner  mirror  and  one  to  the  side,  called  the 
outer  mirror.  Each  mirror  may  be  deflected  by  a  gal¬ 
vanometer  upon  which  ii  is  mounted.  1  luis,  the  mirrors 


34 


~wm-nr.r- 


CTg^g^TOraByawcraiTOTs  v  vjr_jjr.piiri  -«n  rr. 


>f  one  camera  provide  “vertical”  and  “horizontal”  de- 
lcction  of  its  line  of  sight  which  is  independent  of  the 
ran  and  pitch  angles  of  the  motors,  and  of  the  mirrors 
n  front  of  the  other  came.  a.  The  motors  and  the  mir- 
■ors  provide  redundant  degrees  of  freedom  for  tracking 
n  3-D  space.  The  motors  are  limited  to  moving  both 
:ameras  at  once,  so  the  mirrors  must  be  used  at  least 
to  verge  the  lines  of  sight  to  the  same  point  in  space. 

Using  25  mm  lenses,  the  area  of  any  one  image 
:orresponds  to  approximately  17  deg  vertically  (the  di¬ 
rection  that  is  associated  with  deflections  of  the  inner 
mirror)  and  17  deg  horizontally  (the  direction  that  is  as¬ 
sociated  with  deflections  of  the  outer  mirror).  Bach  line 
of  sight  may  be  deflected  over  a  vertical  range  of  roughly 
18  deg  by  the  inner  mirror,  and  a  horizontal  range  of 
18  deg  by  the  outer  mirror.  The  ability  to  rotate  the 
entire  platform  using  the  stepping  motors  greatly  ex¬ 
tends  the  range  of  angles  which  the  cameras  can  view. 
The  motes  can  rotate  the  platform  through  pitch  an¬ 
gles  of  ±70  deg  from  level  and  through  pan  angles  of 
±180  deg. 

To  move  the  eye-head  robot  six  devices  must  be 
controlled:  four  galvanometers  and  two  stepping  mo¬ 
tors.  A  central  computer,  the  LISP  machine,  accesses 
the  image  input  via  a  frame  grabber  board  and  employs 
a  digital  control  algorithm  to  generate  mirror  position 
commands  and  motor  speed  c  mimands  (Figure  2)  An 
A/D  converter  changes  the  mirror  commands  into  ana¬ 
log  voltages  in  the  range  which  the  four  galvanometer 
controllers  accept  as  input  for  determining  the  positions 
of  the  mirrors.  The  LidP  machine  sends  the  stepping 
motor  commands  via  a  parallel  digital  bus  to  an  Intel 
8031  microprocessor-based  controller.  The  micropro¬ 
cessor  decodes  the  8- bit  command  and  handles  the  I/O 
intensive  task  of  generating  timed  step  sequences  for  the 
motors. 

K.  Cornog,  T.  Poggio  and  K.  Nishihara  imple¬ 
mented  control  algorithms  for  the  guidance  of  “eye” 
ar.d  “head”  movements  during  the  tasks  of  Cxating  and 
tracking  an  object  in  3-D.  Our  algorithms,  executed  in 
I, ISP,  allow  the  rye-head  robot  to  track  an  object  mov¬ 
ing  up  to  I5<leg/sec  and  to  hold  the  image  stable  to 
wilhin  3  pixels.  K.  Cornog  has  reviewed  control  strate¬ 
gics  of  the  primate  oculomotor  system.  She  has  has 
compr  red  the  control  and  performance  of  the  robot’s 
binocular  fixation  and  tracking  systems,  to  those  of  the 
smooth  pursuit,  saccadic  and  cyc-hcad  coordination  sys¬ 
tems  of  the  primate.  They  And,  for  instance,  that  a 
control  system  including  botli  positional  and  velocity 
feedback  improve  the  ability  of  the  robot  to  Gxate  anil 
stabilize  the  image  of  a  moving  target. 


4.2.  Catching  a  ball 

We  are  very  adept  at  usiug  the  purely  two-dimensional 
information  we  get  from  our  retinae  to  manosver  and  re¬ 
act  to  the  three-dimensional  world:  witness  the  tennis 
player  returning  a  10C  m.p.h.  serve.  How  we  manage 
to  reconstruct  the  three-dimensional  character  of  the 
world  from  these  two-dimensional  representations  has 
been  a  lively  subject  of  research  in  the  last  ten  or  fifteen 
years.  One  principle  that  has  emerged  unifying  many 
of  these  ideas  is  the  need  for  constraints  tc  allow  the  vi¬ 
sual  system  to  interpret  the  images  it  receives  as  three- 
dimensional.  Those  constraints  come  from  assumptions 
about  the  nature  of  the  situation  that  produced  the  im¬ 
age. 

Sr.xbcrg  and  Poggio  (Saxhorn,  1985)  have  looked  at 
how  gravity  can  be  used  as  a  constraint  in  the  case  of 
a  free  fall  trajectory  projected  onto  an  image  plane  by 
central  projection.  We  showed  tbat  in  principle  there 
is  enough  information  in  the  time  dependent  projected 
tr  jeetory  to  exactly  reconstruct  the  original  parabolic 
trajectory  in  three-dimensions. 

We  examined  several  methods  for  deriving  the  ini¬ 
tial  conditions  of  the  trajectory  from  the  trajectory  ever 
in  the  presence  oi  noise.  Two  techniques  turned  out 
to  lunction  quite  well  an  simulated  trajectories  in  the 
presence  of  noise:  one  uses  a  Alter  whose  width  varies 
with  time  to  filter  out  noise  frem  the  trajectory  data 
as  it  accumulates;  the  other  uses  a  least-squares  tech¬ 
nique  to  solve  directly  for  the  initial  condition  param¬ 
eters  from  the  accumulating  noisy  p-ojcctcd  trajectory, 
in  both  cases,  good  estimates  of  the  initial  conditions 
were  achieved  from  simulated  projected  trajectory  data 
even  in  the  presence  of  considerable  noise.  Performance 
depended  on  the  amount  of  noise  added,  the  sampling 
rate,  and  the  duration  of  the  trajectory. 

Saxberg  also  ran  a  limited  test  of  the  two  methods 
on  image  data.  In  one  case,  he  simulated  the  image 
data  of  a  high  contrast  ball  traveling  in  a  parabola;  in 
another,  he  used  an  actual  video  tape  of  a  tennis  ball. 
We  applied  a  very  simple  thresholding  and  chord  draw¬ 
ing  technique  to  identify  the  center  of  the  ball  in  each 
image,  and  used  this  projected  trajectory  information 
as  the  input,  to  the  two  routines  described  above.  With 
the  synthesized  image  data,  both  techniques  gave  es¬ 
timates  of  initial  conditions  that  were  within  5-10  per 
cent  of  the  true  initial  conditions;  with  the  video-tape 
and  the  thrown  ball,  where  initial  conditions  were  not 
easily  measured,  the  two  Icchuiqcs  gave  plausible  esti¬ 
mates  which  agreed  very  closely  with  each  other. 


it 


35 


5-  Visual  Routines 

Our  work  on  visual  routines  addresses  the  second  major 
stage  of  visual  information  processing  -  the  application 
of  the  information  in  the  early  representations  to  object 
recognition,  visually-guided  manipulation  and  more  ab¬ 
stract  visual  thinking.  A  fundamental  requirement  on 
the  processing  at  this  stage  is  the  robust  and  efficient 
computation  of  an  open-ended  variety  of  shape  proper¬ 
ties  and  spatial  relations.  Humans  surpass  by  far  the 
ability  of  current  automated  methods  of  spatial  analy¬ 
sis  to  cope  with  visual  problems  such  as  'Is  there  a  dot 
inside  a  dosed  curve?',  "Is  the  figure  labelled  A  above 
and  to  the  right  of  the  one  labelled  B?",  or  "Which  of 
the  figures  are  colinear?"  Both  in  terms  of  speed  and 
the  class  cf  inputs  handled,  the  performance  difference 
is  huge.  The  limitations  of  current  approaches  to  spa¬ 
tial  analysis  seem  to  stem  from  a  lack  of  appropriate 
representations  and  algorithms,  as  well  as  a  need  for 
specialized  parallel  architectures. 

To  address  the  analysis  of  spatial  relations,  Ullman 
(1981)  proposed  the  notion  of  visual  routines  -  sequences 
of  elemental  spatial  operations,  drawn  from  a  small, 
fixed  set,  which  are  applied  to  the  early  visual  repre¬ 
sentations.  The  set  of  elemental  operations  should  be 
such  that,  by  combining  them  in  different  ways,  routines 
for  an  open-ended  set  of  abstract  spatial  relations  and 
properties  can  be  defined.  These  elemental  operations 
must  be  powerful,  robust,  and  very  efficient. 

Ullman’s  proposal  raises  four  major  questions  for 
research.  First,  what  set  of  basic  operations  will  sup¬ 
port  the  spatial  analysis  computations  which  arc  needed 
iu  the  course  of  recognition,  visual  prohlriu-soiving,  etc. 
Second,  how  arc  these  elemental  operatins  integrated 
into  routines  for  establishing  specific  relations,  and  wliat 
arc  the  general  principles  governing  this  integration. 
Third,  by  wliat  means  can  visual  routines  be  selected 
and  controlled  -  for  example,  in  the  course  of  processing 
a  scene,  what  triggers  them,  and  how  is  their  order  o. 
execution  determined.  Finally,  how  can  visual  routines 
be  .assembled  or  modified  to  meet  new  requirements. 

Mahoney’s  work  so  far  has  focussed  on  the  first  two 
problems.  I'Mm  .u  (1981)  made  some  spec i lie  sugges¬ 
tions  for  basic  operations,  including  shift  of  the  process¬ 
ing  focus,  indexing  "odd-inaii-out”  locations,  boundary 
tracing,  area  coloring,  and  location  marking.  Working 
from  these,  we  detailed  possible  visual  routines  for  a 
range  of  visual  tasks  which  were  posed  in  the  context  of 
schematic  drawings.  These  tasks  are  related,  for  exam¬ 
ple,  to  the  interpretation  of  terrain  maps,  and,  in  certain 
cases,  the  interpretation  of  edge  images  derived  from 
real  scenes.  Building  on  a  very  simple  implementation 


of  the  proceeding  basic  operations,  Mahoney  bas  tested 
a  number  of  routines  proposed  for  solving  problems  like 
"count  the  dots  that  are  inside  curves",  "find  a  curve 
nesting  two  or  more  other  curves",  "find  a  location  that 
is  not  inside  any  curve”,  and  some  simple  figure/ground 
separation  tasks.  This  line  of  work  is  mainly  aimed  at 
exposing  issues  pcrlain’ng  to  the  integration  of  the  pro¬ 
posed  basic  operations  into  useful  visual  routines,  and 
providing  further  and  more  detailed  requirements  on  the 
set  of  basic  operations.  A  longer  term  goal  is  to  im¬ 
plement  what  could  be  thought  of  an  a  "programming 
system”  for  spatial  analysis,  applicable  in  a  variety  of 
practical  contexts.  The  system  would  provide  a  set  of 
basic  operations  and  g.neial  purpose  visual  routines. 
The  visual  routines  for  a  particular  application  would 
build  on  these.  We  plan  demonstrate  this  idea  for  the 
domain  of  simple  terrain  maps  in  particular.  A  system 
of  this  type  could  be  used  "stand-alone",  or  it  might  be 
integrated  into  a  larger  vision  machine. 

These  experiments  also  highlight  the  need  for  novel 
approaches,  at  the  level  of  representation  and  Jgorithm, 
to  providing  operations  such  as  boundary  tracing  or 
area  coloring.  It  is  e;a;y  to  generate  examples  which 
would  present  difficulties  to  the  straightforward  imple¬ 
mentations  of  these  operations.  The  main  thrust  of  our 
research  is  to  invent  algorithms  and  supporting  repre¬ 
sentations  for  very  fast  and  general  boundary  tracing.  It 
is  common  for  boundaries  to  fragmented,  superimposed 
upon  background  figures,  or  comprised  of  very  abstract 
curvilinear  structures,  such  as  Uixturr  changes  or  prox¬ 
imity  groupings  of  small  figures.  A  general  boundary 
tracing  operation  must  cope  well  with  all  cf  these  cases, 
without  suffering  a  substantial  sacrifice  in  processing 
rate.  Similar  considerations  also  apply  to  the  area  col¬ 
oring  operation. 

We  ne  aim  exploring  local,  parallel  methods  of  de¬ 
tecting  blobs  -  areas  in  the  input  that  are  significantly 
different  from  their  siirriumlings  •  along  with  a  measure 
of  how  conspicuous  they  are.  The  goal  of  this  process¬ 
ing  is  to  enable  initial  analysis  to  be  applied  selectively 
to  these  interesting  areas,  and,  sometimes,  to  make  the 
boundaries  of  these  regions  explicit  for  later  input  to 
tracing  operator's  For  example,  in  the  processing  of 
a  complex  scene,  it  would  often  he  useful  for  the  rou¬ 
tines  which  initiate  recognition  to  be  directed  first  to 
areas  which  correspond  to  the  larger  or  otherwise  more 
prominent  objects,  rather  than  to  the  finer  details. 

Related  to  this,  Koch  and  l.  liman  (1981)  have  ad¬ 
dressed  the  problem  of  how  simple  networks  can  account 
for  selective  shifts  in  visual  attention.  They  proposed, 
as  one  of  the  early  representations,  a  pointwise  repre¬ 
sentation  of  s.diencv  iu  various  local  properties  such  as 
color,  orientation,  direction  of  movement,  disparity,  etc. 


36 


A  selective  mapping  exists  between  this  saliency  repre¬ 
sentation  and  a  central  represention,  such  that  at  any 
given  time  the  latter  contains  the  properties  of  only  a 
single  location  in  the  vi.-ual  Geld.  The  main  selection 
criterion  is  ndicncy,  and  Koch  and  L’llman  propoeed 
implementation  by  a  winner -take-all  network  boilt  on 
a  specific  hierarchical,  pyramid-like  architecture  raoet 
of  whuse  connections  are  iocaj,  and  whose  processing 
elements  perform  only  simple  operations  such  as  ad¬ 
dition  or  multiplication,  and  do  not  process  symbolic 
information  inch  aa  addresses.  They  also  suggested  ad¬ 
ditional  selection  rales  which  can  account  for  similarity 
and  proximity  effects,  and  changes  in  the  selected  loca¬ 
tion  in  time. 

READING  LIST 

Asada,  llaruo  and  Michael  Brady.  The  Curvature 
primal  sketch',  Massachusetts  Institute  of  Technology 
ArtiGci.il  Intelligence  Laboratory  Memo  758,  1988. 

Brady,  Michael,  and  llarno  Asada.  "Smoothed  lo¬ 
cal  symmetries  and  their  implementation ',  Int.  J.  of 
Robotics  Research,  3  (3),  1988. 

Brady,  Michael,  Philip  Agre,  David  Braunegg,  and 
Jonathan  Connell.  “The  Mechanic’s  Mate",  ECAl  88: 
Advances  in  Artificial  Intelligence,  T.  O’Shea  (ed.), 
Elsevier  Science  Publishers  B.V.,  Nortb-llolland,  Anu- 
trrdani,  1988, 

Brady,  Michael,  Jean  Ponce,  Alan  Yuille,  and 
llaruo  Asada.  “Describing  surfaces',  Computer  Vision, 
Graphics,  and  Image  I’roeessing,  1985. 

Brady.  Michael.  “Representing  shape,"  Proe.  IEEE 
Conference  on  Robotics.  Atlanta,  1988. 

Brady,  Michael,  and  Alan  Yuille.  “An  extremum 
principle  Tor  shape  from  contour,"  IEEE  Trans.  Pat¬ 
tern  Analysts  0(3),  1988. 

Brady,  Michael,  and  Alan  Yuille.  “Representing 
three-dimensional  shape,*  itnniansy  Conference,  Udine, 
Italy.  1988. 

Brooks, M.J.  and  Horn,  llerthold  K.P.  "Shape  and 
Source  from  Shading",  A.I.  Memo  720  (1985) 

Biicli.shaiiin,  G.  “A  spatial  processor  model  for  ob¬ 
ject  color  perception,"  J  f  ranklin  Inst  .  110,  1080. 

Canny,  John  F  "Finding  edges  .uid  lines,"  Massa¬ 
chusetts  lialitute  of  Technology  Iru'iincal  Report  720, 
1983. 

C'onnrll.  Jonathan  II.,  and  Michael  lliady.  "Gener- 
almg  and  generalising  models  of  visual  objects,-"  Mass¬ 
achusetts  institute  of  Trclinologv  Artificial  Intelligence 


Laboratory  Memo  823,  1985. 

Connell.  Jonathan  IL,  and  Michael  Brady.  “Learn¬ 
ing  r- li.ipr  I),  script ion>,"  IJCAl  X 5  Proceedings,  1985. 

Comog,  Katherine  II.  “Smooth  Pursuit  and  Fix*- 
lion  for  Robot  Vision,"  Massachusetts  Institute  c.f  Tech¬ 
nology  Department  of  Electrical  Engineering  aid  Com¬ 
puter  Science  Master's  Thesis,  1985. 

Fleck,  Margaret.  “Local  Rotational  Symmetries,* 
Massachusetts  institute  of  Technology  Department  of 
Electrical  Engineering  and  Computer  Science  Master's 
Thesis.  1985. 

Flynn,  Anita  M.  “Redundant  sensors  for  mobile 
robot  navigation,"  Massachusetts  Institute  of  Tech¬ 
nology  Department  of  Electrical  Engineering  and  Com¬ 
puter  Science  Master’s  Thesis,  1985. 

Oman,  Stuart,  and  Don  Geman.  “Stochastic  re¬ 
laxation,  Gililis  distributions,  and  the  Bayesian  restora¬ 
tion  of  images,"  IEEE  Trans.  Pattern  Analysis  and  Ma¬ 
chine  Intelligence,  0,  1988. 

Crimson,  W.E.L.  From  Images  to  surfaces, 
Massachusetts  Institute  of  Technology  Press,  Cam¬ 
bridge,  Maas.,  1981. 

Crimson,  W.E.L.  “Surface  consistency  constraints 
in  vision,*  Computer  Vision,  Crap hus,  and  Image  Pro¬ 
cessing,  24,  1983. 

Crimson,  W.  E.  L.  “A  computational  theory  of  vi¬ 
sual  surface  interpolation,*  Phil.  Tr ans.  R.  Sot.  Lon¬ 
don  U,  1982. 

Crimson,  W.  E.  L.,  and  T.  Loiano-Pcres.  “Model- 
Based  Recognition  and  Localisation  from  Sparse  Range 
ur  Tactile  Data,*  Internationa  Journal  of  Robotics  Re¬ 
search ,  3  1988. 

Crimson,  W.  E.  L.,  and  T.  Loaano-Peres,  "Model- 
based  Recognition  ami  Localisatiou  from  Tactile  Data,* 
IEEE  Computer  Society  Int.  Conf.  on  Robotics,  At¬ 
lanta.  March  1988. 

Crimson,  W.  E.  !,.,  and  T.  Losano-Peres.  “Ree.og- 
nil  ion  and  Lor. doc. it  ion  of  Overlajiping  Parts  from 
.— ji.ir:  i  I'.il.i  in  I  an  .mil  "lime  I  non  iisioti.,,"  IEEE 
Computer  Society  Int.  Cunf.  on  Robotics,  St.  Louis, 
March  1985. 

Crimjon,  W.  E.  I.  .  and  T.  laii.mo  Peres.  “Model- 
llaaed  Itecoguit inn  and  laicaliiatiun  From  Sparse  Range 
Data."  in  Techniques  for  3-D  Machine  Perreptlon, 
A.  Rnsetifeld  (nl),  North  Holland,  Amsterdam,  I98S. 

Cimison,  W.  E.  L.  .uni  T.  Loxono- Peres.  "Search 
and  Srnsiiig  Strategies  far  llecogtut ion  and  Localisa- 


a. 

,N 

,-s 


* 

i 


J 


*» 


f. 

t. 


» 


lion  of  Two  and  Three  Dimensional  Objects,"  Third 
Int.  Symp  on  Robotics  Research,  Gouvieux,  France, 
October  1083.  Published  by  MIT  Press,  Cambridge, 
Mane. 

Crimson,  W.  E.  L.,  and  T.  Lozano- Peres.  “Recog¬ 
nition  and  Localisation  of  Ovcrlappin;  Parts  from 
Sparse  Data,"  in  Three-Dimensional  Vision  Sys¬ 
tems,  T.  Kanade  (ed),  Kluwer  Academic  Publishers, 
1985. 

Crimson,  W.  E.  L.  “The  Combinatorics  of  Local 
Constraints  in  Model-Based  Recognition  and  Localisa¬ 
tion  from  Sparse  Data,  Massachusetts  Institute  of  Tech¬ 
nology  Artificial  Intelligence  Laboratory  Memo  763, 
1984. 

Crimson,  W.  E.  L-,  and  T.  Losanc  Peres  “Recog¬ 
nition  and  Localisation  of  Overlapping  Parts  Rom 
Sparse  Data,"  Massachusetts  Institute  of  Technology 
Artificial  Intelligence  Laboratory  Memo  841,  1985. 

Ililbs,  D.  “The  Connection  Machine,*  Massachu¬ 
setts  Institute  of  Technology  Department  of  Electrical 
Engineering  and  Computer  Science  Pb.D.  Thesis,  1985. 

Hopfield,  J.J.  “Neurons  with  graded  response  have 
collective  computational  properties  like  those  of  two- 
state  neurons,"  Proc.  Mall.  Acad.  Sci.  USA,  81,  1984. 

Horn,  Berthold  K.P.  “On  Lightness,"  Massachn- 
setts  Institute  of  Technology  Artificial  Intelligence  Lab¬ 
oratory  Memo  295,  1974. 

Horn,  Uerthohl  K.P.  Robot  Vision,  MIT  Press 
ami  McGraw-Hill  (1985). 

Horn,  H.K.P.,  “Obtaining  shape  from  shading  in¬ 
formation",  in:  The  Psychology  oj  Computer  Vi* ion, 
P.II.  Winston,  ed.,  McGraw-llili  Pub!.,  New  York,  115- 
155,  1975. 

Horn,  B.K.P.,  “Understanding  image  intensities”. 
Artificial  Intelligence,  8,  201-231,  1977. 

Horn,  B.K.P.,  and  Srhunck,  B  C.,  “Determining 
optica!  flow",  Artificial  Intelligence,  17,  185-203,  1981. 

llurlbrrt.  Anne.  “Color  computation  in  the  visual 
system,"  Massachusetts  Institute  of  Technology  Artifi¬ 
cial  Intelligence  Laboratory  Memo  8M,  1985,  in  press. 

Ilurlbert,  Anne,  and  Tomaso  Poggio.  “Associative 
h-arniag  of  -t.inilard  icgnlarmng  operators  in  early  vi¬ 
sion,"  Nlass.ichnsct Is  Institute  of  Trc linulogy  Artificial 
InU-lligepcc  Laboratory  Working  Paper  204,  1984. 

Kirkpatrick,  S  ,  Celatt,  C  D.,  Jr.,  and  Vecchi,  M.P. 
"Optimization  by  simulated  annealing,"  Science.  220, 
1983. 


Koch,  Chriatof,  and  Ullinan,  Shimon.  “Selecting 
one  among  the  many:  a  simple  network  implementing 
shifts  in  selective  visual  attention*,  Massachusetts  In¬ 
stitute  of  Technology  Artificial  Intelligence  Laboratory 
Memo  770,  C.B.l.P.  Paper  003,  1984. 

Koch,  Curistof,  Jose  Marroquin,  and  Alan  L. 
Yuille.  “Analog  ‘neuronal’  networks  in  early  vision," 
Massachusetts  Institute  of  Technology  Artificial  Intelli¬ 
gence  Laboratory  Memo  751,  1985. 

Land.  Edwin  H.  “Recent  advances  in  retina  theory 
and  some  implications  for  cortical  computations:  colour 
vision  and  the  natural  image,"  Proceeding*  of  the  Mo¬ 
tional  Academy  of  Sciences,  80,  1983. 

T.  Lozano- Pf  re*  and  W.  E.  L.  Grimsca,  "Recog¬ 
nition  and  localisation  of  overlapping  parts  from  spans 
data,"  See ond  Int.  Symp.  on  Robotics  Research,  Kyoto, 
J.ip.ji,  August  1981.  Published  by  MIT  Press,  C'azn- 
hndge,  Mas*. 

Maloney,  Laurence  T.  “Computational  approaches 
to  color  constancy*  Stanford  University  Tech.  Report 
1985-01,  1985. 

Marr,  David. “Early  processing  of  visual  informa¬ 
tion,*  Phil.  Tram.  R.  Soc.  London  B27S,  1976. 

Marr,  David,  and  Keith  Nishihara.  "Representa¬ 
tion  and  recognition  of  the  spatial  organisation  of  three 
dimensional  shapes,"  Proc.R.Sor.Lond.  D.  200,  1978. 

Marroo.uin,  J.  1984  "Surface  reconstruction  pre¬ 
serving  discontinuities,"  Massachusetts  Institute  of 
Technology  Artificial  Intelligence  Laboratory  Memo 
792,  1984. 

.Marroquin,  J.  “Optimal  bayesian  estimators  for  im¬ 
age  segmentation  and  surface  reconstruction,"  Massa¬ 
chusetts  Institute  of  Technology  Artificial  Inlelligoiice 
Laboratory  Memo  839,  1985. 

Marroquin,  J.  ‘Probabilistic  solution  of  inverse 
problems,"  Ph.D.  Thesis.  Massachusetts  Institute  of 
Technology  ,  1985. 

Metropolis,  N.,  A.  Rosenbluth,  M.  Rosenbluth,  A. 
Teller,  and  E.  Teller.  “Equation  of  State  Calculations 
by  K.mt  Computing  Machines  *  J.  Phy*.  Chem.  21, 
1953. 

Moroiov,  V.A.  Methods  for  Solving  Incor¬ 
rectly  Posed  Problems,  Springer-Verlag,  New  York, 
1984. 

Poggio,  Tomaso.  “Early  vision:  from  compu¬ 
tational  structure  to  algorithms  and  parallel  hard¬ 
ware  "  ( 'orrxpvter  1’ision,  Graphic*,  and  Image  Practis¬ 
ing.  31,  1985. 


Poggio,  Tomaso,  and  Vincent  Torre.  “Ill-posed 
problems  and  regularization  analysis  in  early  vision,* 
Massachusetts  Institute  of  Technology  Artificial  Intelli¬ 
gence  Laboratory  Memo  773,  C.D.I.P.  Paper  001,  1984. 

Poggio,  Tomaso,  ami  Ohristof  Koch.  “An  analog 
■■lixli'l  uf  miiipitUtlidii  for  l lie  iIl-i»«jr-*-*l  problems  of  early 
vision,*  Massachusetts  Institute  of  Technology  Artificial 
Intelligence  Laboratory  783,  C.D.I.P.  Paper  002,  1984. 

Poggio,  Tomaso,  Vincent  Torre,  and  Chris tof  Koch. 
“Computational  vision  and  regularization  theory,*  N* 
lure  317,  1985. 

Poggio,  Tomaso,  Harry  Voorhees,  and  Alan  L. 
Yuille.  ‘Regularizing  Edge  Detection,’  Maaaachoaetta 
Institute  of  Technology  Artificial  Intelligence  Labora¬ 
tory  Memo  776,  1984. 

Ponce,  Jean,  and  Michael  Brady,  “Towards  a  sur¬ 
face  primal  sketch.*  Massachusetts  Institute  of  Tech¬ 
nology  Artificial  Intelligence  Laboratory  Memo  824, 
1985. 

Rubin,  John,  and  Whitman  Richards.  “Colour  Vi¬ 
sion  Representing  Material  Categories,*  Massachusetts 
Institute  of  Technology  Artificial  Intelligence  Labora¬ 
tory  Memo  764,  1984. 

Saxberg,  Bror  V.  H.  “Parameter*  of  a  Three- 
Dimensional  Free-Fall  Trajectory  Front  its  Tero- Dimen¬ 
sional  Central  Projection  ,*  Massachusetts  Institute  of 
Technology  Department  of  Electrical  Engineering  and 
Computer  Science  Master's  Thesis,  1985. 

Tersopoulas,  Demetri.  "Multi'evel  computational 
processes  for  visible  surface  representation,"  Computer 
Vision,  Graphics,  and  Imoft  Processing  24,  1983. 

Trrxopoulos,  Deinetri-  "Multiresolution  Computa¬ 
tion  of  Visible  Surface  Representations,”  Massachusetts 
Institute  of  Technology  Department  of  Electrical  En¬ 
gineering  and  Computer  Science  PhD.  Thesis,  1984. 

Trrsopoulos,  Deinetri.  ‘Computing  visible-surface 
representations.*  Massachusetts  Institute  of  Technology 
Artificial  Intelligence  Laboratory  Mrmo  800,  1985. 

Terzopoulos,  Demctri.  “Concurrent  multilevel  re¬ 
laxation  algorithms,"  Massachusetts  Institute  of  Tech¬ 
nology  Artific-al  Intelligent e  Laboratory  Mrmo  851, 
1985. 

Terupotilos,  Drmrtri.  “Image  analysis  using 
niiiltigrul  relaxation  methods,"  IEEE  Trans.  Pat¬ 
tern  Analysis  and  Machine  Intelligence,  1986,  in 
press. 

Tertopoulos,  Drnietri.  “Regularization  of  in¬ 

verse  visual  problems  involving  disc  out  intnties,*  IEEE 


Trans.  Pattern  Analysis  and  Machine  Intelli¬ 
gence,  1986,  in  press. 

Tikhonov,  A.  N.  and  V.  Y.  Arsenin.  Solution  of 
Ill-Posed  Problems,  Winston  and  Wiley  Publishers, 
Washington  D.C  ~!77. 

Torre,  Vince:.:,  and  Tomaso  Poggio.  “Cn  edge  de¬ 
tection,*  Massachusetts  Institute  of  Technology  A'tifi- 
cial  Intelligence  Laboratory  Memo  768,  1984. 

Ullman,  Shimcn.  “Visual  routines,*  Cognition,  18, 

1984. 

Wandell,  Brian,  and  Laurence  Maloney.  “Compu¬ 
tational  methods  for  colour  identification,*  Proceedings 
Optical  Society  o/  America,  1984. 

Winston,  Patrick  H.  "Learning  new  principles  &om 
precedents  and  exercise*,*  Artificial  Intelligence  10, 
1981. 

Winston,  Patrick  H.  “Learning  by  augmenting  rules 
and  accumu'ating  censors,"  Massachusetts  Institute  of 
Technology  Artificial  Intelligence  Laboratory  Memo 
678,  1982. 

Yuille,  Alan  L.  and  Tomaso  Poggio.  “Scaling  The¬ 
orems  for  Zero-crossings,’  Massachusetts  Institute  of 
Technology  Artificial  Intelligence  Laboratory  Memo 
722,  1983. 

Yuille,  Alan  L.  and  Tomaso  Poggio.  “Fingerprints 
Theorem*  for  Zero-crossings,*  Massachusetts  Institute 
of  Tcchnoiogy  Artificial  Intelligence  Laboratory  Memo 
730,  1983. 

Yuille,  Alan  L.  “A  method  fur  computing  •pertr.il 
irllrrl.ince,”  M  uwachu- rtts  Institute  ol  •  finoLi  •  ■  V  . 
iiluial  liiieihgt-iice  l.-ilmr.iiury  Mfiim  i'52,  I9-.  I. 


f.s 


•  ’j 


•f 

> 

U 


\n 

') 

L 

i 

r 

tew 


V 


39 


Gc°nc>m 


THE  SRI  IMAGE  UNDERSTANDING  RESEARCH  PROGRAM 

Martin  A  Fisdilf  /'  nncipal  Investigator) 

Artificial  intelligence  Center 
SRI  International,  Memo  Park,  California 


ABSTRACT 

Tlir  Id  lma«r  I lelrrMaiiduic  profrmaa  it  a  broad  effort 
'p.niiiint  i hr  miirr  r»ii*r  of  aiacbioc  vwiou  rrtrairh  Its  three 
major  ronrrm*  ar r:  (!)  to  derrlop  an  undemanding  of  the 
pli>»ir»  and  maihrmano  of  tbe  vwoa  proertt,  (2)  to  develop  a 
l,iiii«lrtlrr-li»«r<l  framework  for  integral mf  and  rraaoninf  about 
•rn-ril  (imarnl )  data,  and  (1)  to  develop  a  aiathinr  >aaed  mvt- 
rt  Tinir.il  for  .(fertile  experimentation,  detiofiat ration  and  rval- 
naii  >n  of  our  ilieorriiral  rrtullt,  at  arrll  at  providing  a  vehicle 
for  irrhnohifv  Iran-frr  Tbit  report  detrnbra  recent  profrett  in 
all  ihrre  ana'  In  |iariirular,  art  draenbe  profrraa  in  rnoatrucl- 
mg  and  ir«iiiig  a  Mair-of-lbr-art  automated  ajratetB  for  aterro 
rompilaiion.  n-a  appn  aches  lorxtrartiof  depth  and  xruclural 
information  from  imaged  data,  a  ’knowledge-baaed  ny«te«  for 
fraiurr  rtirariion.  and  lords  for  arrar  model tr^  end  interaction 
auh  a  machine  lerran'  data  baae. 

1  INTRODUCTION 

The  roal  of  ihi>  reae»*rh  program  ia  to  obtain  aoCitiooa  to 
fundamenial  prvlili-ma  ir  computer  viatoo;  particularly  to  turb 
problem'  a*  'i.  rro  rompilatioa,  feature  extraction.  and  rrneral 
'Ccne  mod.  loir  ilia'  are  relevant  to  the  development  of  an  au- 
tomainl  capal.ilui  for  intrrprrtinf  venal  imagery  and  .he  pro- 
duriion  of  cartographic  produrU. 

To  aclnrir  tin'  goal,  ar  are  engaged  in  invratigatiom  of  aurh 
lia*ie  i"iir«  a'  image  matching,  partitioning.  rrpmenlal»a.  and 
phv'iral  :iio.|.  ling  Ifoaevrr.  high-level,  high- preform ancr  iwoa 
rr.|iiire'  I  hr  u-e  of  Ixiih  intelligence  and  atorrd  kuoalrdgr  (lo 
provide  an  ini«*gr;iti\r  framework),  aa  well  aa  an  understand¬ 
ing  of  the  phi 'ir«  ami  malhrmatK!  of  Ibe  imaging  proenm  (to 
prondr  i  he  l-a'ie  informal  too  needed  for  a  rraaooed  inirrprr- 
taiu  !i  of  i  lie  'I'ti'cd  dal  a)  Tbua,  a  ugnifirant  portion  of  our 
aorl.  I'.lfioicd  lo  .let eloping  new  approaches  to  the  problem  of 
*kno* led-e  ba'ed  » i'ion  *  Kmally.  viwoo  research  cannot  pro- 
e e.-d  wii hoiil  a  mean'  for  effectivr  implementation,  demonstra¬ 
tion.  and  experimental  tenfieation  of  theoretical  concepts;  wr 
hate  developed  an  environment  in  which  some  of  fhr  newest 
and  most  rlfeeine  rompnting  inurumenis  ran  hr  rmphiyrd  for 

llle'C  plirpo'C' 

The  n-r.irrli  rr'iili'  .Icvcnlied  in  this  report  a re  partitioned 
into  three  i opir  area'.  ( 1 1  l  hree-dimensicnai  scene  modeling  and 
'irreo  rrri.ii'lrurtion:  (2)  feature  extraction:  scene  partitioning 
and  'rniantir  labeling:  and  (H)  inleraclive  •erne  modeling  and 
lino*  ledge-base  rofi'iriieiiofi. 


2  THREE-DIMENSIONAL 
SCENE  MODELING  AND 
STEREO  RECONSTRUCTION 

Ou/  guaJ  in  this  rrweareb  vri  i*  'o  develop  automated 
method.*  tar  producing  a  3-D  sccdc  model  from  scvrraJ  images 
recorded  from  different  vrrw  points  Th*  -^andard  approach  to 
tbw  ia  to  uae  stereo  compile* ton  a  technique  that 

involves  'n>dinf  pair*  of  corresponding  vrtir  points  in  two  im- 
ifn  (wrich  -irpset  lb#  kw  from  different  spatial  locations) 
and  using  Ui  angulation  to  drurmtut  seme  depth.  Various  fac¬ 
tors  asa>r>at#d  with  viewing  conditions  and  seme  content  can 
cause  i  hr  matching  process  to  fail;  these  far  to**  include  occlu- 
moo.  pr.jeclive  or  imaging  distort  ion.  featureless  areas,  and  *e* 
pealed  or  penodic  scene  structures.  Some  of  the**  prcbieK*  car 
only  he  solved  by  providing  the  machine  with  a  global  cooiext 
for  dealing  with  tbe  miaaing  or  ambiguous  information.  Thus, 
am  important  component  of  this  research  effort.  di«ru»s*d  in  thr 
section  oo  interactive  scene  modeling.  i»  to  devi*e  marLiorry  by 
which  a  human  operator  can  simply  and  effectively  provide  tbr 
needed  information.  In  the  remainder  of  this  srvtioo  we  limit 
our  dtarusaioo  to  direct  approaches  more  effective  Method* 
for  image  matching,  interpolation  for  filling  in  "holes*  caused  by 
matching  failure,  and  some  ranting  and  radically  i.rw  melhoda 
for  3-D  modeling. 

2.1  Banal  ins  Starwo  System 

A*  a  framework  for  integration  and  evaluation  of  our  research 
in  modeling  3-D  seme  geometry,  as  well  a*  &  vehicle  for  technol¬ 
ogy  transfer,  we  have  implemented  a  complete  %i.iir-of-llie-«rt* 
stereo  «plea.  This  system,  described  in  Il»nn*h  (nj  [7],  is  ca¬ 
pable  of  producing  a  dense  3-1)  v*-n**  model  from  stereo  pairs 
of  intensity  image*.  Included  m  ihr***  rrfcrrfirrs  are  rr*nil*  of 
testing  ine  system  on  a  number  of  significant  data  vts.  We  l>e- 
Iievr  that  the  current  version  of  this  fully  automatic  system  is 
comparable  to  thr  be*4,  of  the  *emiaut<>matir  I  hum  vn- assisted) 
systems  now  in  opervtioaai  use. 

2.2  New  Methodi  for  Stereo  Compilation 

A*  previously  indicated,  the  conventional  approach  lo  recov¬ 
ering  vene  geometry  from  a  sterro  pair  of  images  is  based  on 
thr  match, rig  of  distinctive  scene  fealurw  as  well  as  on  the  sat¬ 
isfaction  of  constraints  imposed  by  the  viewing  geometry  (e  g., 
the  epi polar  constraint)  Typically,  three  step?  are  reouired: 


-£T4 

a 


e  • 


i. 


40 


V 

> 

> 

£ 


v 


r,' 

I 


(1)  determination  of  the  relative  orientation  of  the  two  images, 

(2)  computation  of  a  sparse  depth  map,  and  (3)  derivation  of  a 
dense  depth  .nap  for  the  given  scene. 

In  tlte  first  step,  points  corresponding  to  unmistakable  scene 
features  are  identified  in  each  of  the  'mages.  The  relative  ori¬ 
entation  of  the  two  images  is  then  calculated  from  these  points. 
This  is.  in  part,  an  unron-trained  matching  task,  v  orrespond- 
ing  image  features  must  he  found.  Without  a  priori  knowledge, 
such  a  matching  procedure  knows  neither  the  approximate  loca- 
!:on  (i  t  the  second  image)  of  a  feature  found  in  the  first  image, 
nor  the  appearance  of  ‘.ha*  feature.  However,  it  is  often  the  case 
that  appearance  will  vary  little  between  images  and  that  they 
were  taken  from  similar  positions  relative  to  the  scene. 

li-covcry  of  the  relative  orientation  of  the  images  reduces  the 
computation  of  a  sparse  depth  map  from  unconstrained  two- 
dimensional  matching  to  constrained  one-dimensional  matching. 
The  quest  for  a  scene  feature  identified  in  the  first  image  is 
reduced  to  a  one-dimetisional  search  along  an  (epipolar)  line  in 
tl  *  second  image.  Identification  of  this  feature  in  the  second 
image  makes  it  possible  to  calculate  the  feature’s  disparity  and, 
hence,  it-  relative  scene  depth. 

Identification  of  corresponding  point*  in  the  two  images  is  typ¬ 
ically  based  on  correlation  techniques.  Area-based  correlation 
processes  inav  lie  applied  directly  to  the  raw  image  irradianees 
or  to  images  that  have  been  preprocessed  in  some  manner.  Edges 
| identified  liv  the  rerti  erossingt  of  the  Laplacian  of  their  image 
irradianees)  have  also  been  used  to  obtain  correspondences. 

The  iiiiconie  of  ibis  -eeond  step  is  a  sparse  map  of  the  scene's 
relative  depth  at  those  points  that  were  identified  in  both  image* 
of  the  stem)  pair 

\  sparse  depth  map  does  not  define  the  scene  topography. 
The  third  and  final  step  in  recovering  the  topography  of  the 
scene  is  filling  in*  i  Ins  sparse  map  to  obtain  a  dense  depth  map 
,,f  I  he  scene  Typically,  a  surface  interpolation  or  approximation 
inrihod  is  used  ns  n  means  of  calculating  the  den1  “  depth  map 
from  ii«  -parse  ruunierpart .  A  surf*ee  approximation  model 
may  be  formulated  to  provide  desirable  image  properties  (such 
as  the  lack  of  additniii.il  zero  crossings  -  in  the  I, apiarian  of  the 
image  irra  haliers  dial  are  artifacts  of  the  surface  approxima¬ 
tion  mm  lei  h  bill.  iiI'iii.  die  surface  model  is  ba*rd  on  a  priori 
requirements  for  die  lined  surface,  such  as  smoothness. 

|  lie  problems  me  mill ered  in  the  first  two  ste|is  recovery 
of  dm  relative  or'cni  aiMUi  of  ihe  images  and  computation  of  the 
.parse  ill-pill  limp  are  dominated  by  the  problems  of  image 
mairliOig  I  al-c  mulches  Hist  arise  from  repetitive  scene  simr- 
lures,  -neb  a-  windows  of  a  building,  or  from  image  fraturrs  dial 
are  III. I  disiirrioe  lai  lea-i  on  the  basis  of  local  evidence)  or 

mr  more  . . .  hi  . . .  matching  env irniinirnt 

dial,  HI  die  c  .11-1  rained  .  n.  imnmrnt.  In  recovering  die  relative 
uriciii ai ion  of  die  images,  we  ran  use  rednnrlanl  mformaiion  in 
an  rlfort  lo  reduce  Ihe  influence  of  false  matches;  this  is  more 
difficult  in  die  rase  when  the  sparse  depth  map  is  computed. 

I  nrt hcrm-iir.  we  have  little  choice  as  lo  which  features  we  may 
u-e  for  spar~-  d-pth  mapping,  if  we  choose  not  lo  use  a  fea¬ 
ture.  we  rann.it  recover  the  relative  depth  t  that  scene  point 
Iwidmul  invoking  scnaniir  of  eoinrxtiial  knowledge) 

I  he  -election  of  suitable  fraturrs  for  determining  image  ror- 
resp.indenee  is  (lillieiill  in  itself  Correlation  (eeliniqoes  eml>ed 
assumptions  that  sre  often  violated  by  the  first  image  features. 


Area-based  correlation  techniques  usually  reflect  the  premise 
that  image  patches  are  of  a  scene  structure  that  is  positioned 
at  one  distinct  depth,  whereas  edges  that  arise  at  an  object  s 
boundaries  are  surrounded  by  surfaces  at  different  scene  depths. 
Edge-based  techniques  are  based  on  the  assumption  that  an  edge 
found  in  one  image  is  not  “moved’’  by  the  rb  '  v  irw  ing  posi¬ 
tion  of  the  second  image,  whereas  re  ro  en  uid  at  bound¬ 

aries  of  objects  whose  surface  gradients  are  taiifc.i.iial  to  the  line 
of  sight  contradict  this  assumption.  1  hese  vould  seem  minor 
problems,  were  it  not  for  the  accuracy  required  of  the  match¬ 
ing  process.  Often,  the  spatial  resolution  of  di-party  measure¬ 
ments  must  be  better  than  the  image’s  spatial  resolution.  Stereo 
matching  sometimes  requires  features  with  properties  that  are 
incompatible  with  what  is  practical  in  iraii-lic  situations. 

The  third  step,  derivation  of  a  dense  depth  map  from  a  sparse 
one.  is  still  far  shoe’  of  having  an  adequate  solution.  Most  ap¬ 
proaches  employ  “blind*  interpolation,  since  no  effective  meth¬ 
od*  are  currently  in  use  for  extracting  depth  f'om  the  irradiance 
data  in  the  individual  images  of  the  stereo  pair 

In  summary,  we  see  thu  the  most  demanding  steps  in  the 
stereo  process  are  the  final  two:  computation  of  a  sparse  depth 
map,  and  derivation  of  its  dense  counterpart.  In  Smith  |13], 
we  describe  a  new  approach  to  stereo  rrmpilation  that  involves 
combining  these  steps  to  recover  a  dense  relative-depth  map  of 
the  xe-ne  directly  from  the  image  data.  We  use  image  irradiance 
profiles  as  input  to  an  integration  routine  that  returns  the  cor¬ 
responding  dense  relslive-deplh  profile.  This  procedure  neither 
matches  image  point!  (at  least  not  in  Ihe  conventional  sense), 
nor  does  it  “fill  in’  data  lo  obtain  Ihe  dense  depth  map.  It 
avoids  lb*  need  lo  make  the  restrictive  assumptions  usually  re- 
, pored  for  stereo  image  matching,  and  it  directly  use*  the  image 
irradiance  data  in  recovering  the  dense  depth  map. 

2.3  New  Methods  for  3-D  Modeling 

Using  Methods  Which  Do  Not  Depend  On 
Stereo  Correspondence 

We  have  noted  the  fart  that  it  will  not  always  be  possible 
lo  find  corresponding  scene  points  in  ihe  two  images  of  a  con¬ 
ventional  stereo  pair,  and  yet.  In  recover  a  dense  scene  model, 
we  need  to  determine  Ihe  depth  at  every  scene  point.  Since  in¬ 
terpolation  will  not  always  provide  an  are epi able  an-wer  when 
matching  fat’s,  we  are  investigating  a  number  of  m-w  n  eliiiiqurs 
for  recovering  scene  depth  that  do  not  require  i-.l.ilili-limg  stereo 
correspondence. 

A  dgnifir uni  body  of  work  **\isis  in  tin*  of  r\» rafting 
depth  from  the  ‘hading  and  \  i*sil if «*  in  a  dngle  im.it;**. 

However,  tlirv  di*Terrnt  techniques  make  a  of  distinct 

j»«'iimption*  about  ’he  nature  *»f  tin*  '  «'*,n«*.  tin*  illumination, 
and  l be  imaging  geom-try  In  Strut  and  I  i*Hil- r  |l  l|  diow 
that  the  distinct  assumptions  employd  l.\  « .*rl»  of  tin  *e  differ¬ 
ent  scheme*  mint  br  equivalent  lo  providing  a  mtoimI  (virtual) 
mine**  of  the  original  M*ene,  and  that  all  id  these  diffiuent  ap¬ 
proaches  ran  be  :rans|aied  into  a  convein  ion  d  i« . formalism. 

In  particular,  we  show  that  it  is  frequent  Iv  possible  to  sirue- 
ture  t tie  problem  a*»  that  of  recoverin'’  « l«*| »’ !•  from  a  stereo  pair 
consisting  of  a  conventional  pen-perl  i\ *■  image  (ie..  the  origi¬ 
nal  image)  ami  an  orthographic  image  (tin*  'inu.il  image).  We 
also  provide  a  new  algorithm  needed  to  accomplish  this  ivpr  of 
Merro-veron  struct  ion  ta>k. 


V, 


K 


y 


u 


h 

v' 


a 


V. 


? 


i 


■ 

V 

o; 

£ 

r- 


r 


£ 


Ip  fYntJand  (10;  we  show  how  focal  gradients  (image  ‘blur’), 
resulting  from  I  hr  limited  depth  of  field  inherent  in  most  optical 
systems.  can  he  us«i  to  recover  ?cene  depth.  The  advantages  of 
rln*«  technique  are  that  it  is  fast,  computationally  vimple,  makes 
no  vpre ial  assumptions  about  the  scene,  and  avoids  the  stereo* 
matching  problem.  Mathematical  analysis  and  experiments  in¬ 
dicate  that  ilir  accuracy  achievable  by  this  technique  is  compa¬ 
rable  lo  wLat  can  be  expected  from  the  ise  of  stereo  disparity 
or  motion  parallax  in  determining  scene  depth. 

For  most  purposes  concerned  with  the  analysis  of  imaged  data, 
determination  of  an  array  of  depths  (e  g.,  as  obtained  by  conven¬ 
tional  stereo  methods)  is  only  the  first  step  in  the  construction 
of  a  seme  deseripl  ion  The  conventional  approach  next  compiles 
larerly  continuous  surfaces  from  the  discrete  depth  information 
au< I  then  attempts  lo  partition  these  surfaces  into  coherent  3  i) 
oldens.  ,\*ide  from  some  •‘till  unsolved  theoretical  problems, 
this  process  is  computationally  expensive  and  time  consuming. 
In  Itnlles  and  Maker  [?|,  we  describe  a  new  method  ?*r  using 
camera  motion  through  a  scene  to  obtain  a  3-D  model  in  which 
higher  level  scene  attributes  are  directly  accessible.  This  tech¬ 
nique  i*  ha-rd  on  considering  a  dense  sequence  of  images  as 
forming  a  olid  block  of  data.  Slices  through  this  solid  at  ap¬ 
propriately  chosen  angles  intermix  time  and  spatial  data  id  such 
a  way  a*  to  simplify  the  partitioning  problem:  these  slices  have 
more  explicit  structure  than  the  conventional  images  from  which 
they  were  obtained.  We  believe  that  (hit  work  is  a  very  impor¬ 
tant  development;  it  offers  a  completely  new  and  direct  method 
for  accessing  information  al>out  te’ene  objects  without  requiring 
a  completely  bottom-up  analysis  process. 


3  FEATURE  EXTRACTION:  SCENE 
PARTITIONING  AND  SEMANTIC 
LABELING 

<  reating  a  scene  description  from  a  phoiograhir  image  re¬ 
quire*  the  ability  to  perform  two  basic  operations:  (a)  partition¬ 
ing  the  image  into  independent  or  rohrrent  pieces,  and  (b)  as¬ 
signing  names  or  semantic  labels  to  these  pieces. 

The  partitioning  operation,  necessary  to  reduce  th*  compu¬ 
tational  complexity  of  the  subsequent  scene- analysis  steps,  has 
proven  to  he  extremely  difficult  to  accomplish:  the  performance 
of  automated  sys  m*  is  still  far  inferior  to  that  of  Immans.  In 
part,  this  disparity  in  performance  occurs  because  humans  ap¬ 
pear  to  employ  contextual  knowledge  and  past  experience  in 
such  task*.  while  niort  available  computational  techniques  cm- 
ploy  only  the  local  intensity  patterns  visible  in  the  image,  i  e.. 
they  perform  ‘syntactic  partitioning.*  For  practical  as  well 
as  iheoretiral  reasons,  we  have  been  pursuing  an  investigation 
(I)  to  determine  the  competence  limits  of  a  purely  syntactic 
approach  to  partitioning  and,  simultaneously,  (2)  to  construct 
an  operational  system  that  approaches  these  limits.  This  inves¬ 
tigation  is  nearing  completion  and  has  resulted  in  a  very  high 
performance  system  that  will  be  described  in  a  p„prr  by  Laws 
now  in  preparation. 

In  Barnard  |I|,  we  describe  one  ot  a  number  of  on-going  in¬ 
vest  igat  ions  that  attempt  to  provide  a  theoretical  basis  for  the 
partitioning  process.  In  this  paper,  Barnard  explores  the  idea 
that  partitioning  decisions  result  in  alternative  descriptions  of  a 


scene,  and  that  the  preferred  partitioning  i>  the  one  that  pro¬ 
vides  the  ‘simplest*  description.  In  a  paj>er  In  rixhler  and 
Holies  j.T.  partitioning  i-  viewed  as  an  explanation  of  how  the 
image  is  related  to  the  scene  from  which  ii  denied;  it  is 
shown  that  romplrtene--  and  xiabihiv  »>f  explanation,  a*  well  ax 
simplicity,  are  useful  partitioning  criteria  -Hire  these  attribute* 
are  necessary  for  an  explanation  to  Ih*  Indies  aide. 

In  Fua  and  Hanson  [*>].  we  describe  an  approach  to  I  he  pn»l>* 
lem  of  converting  a  syntactically  partitioned  image  (eg.,  one 
provided  by  Laws'  segmentation  system  into  a  semantic 
description.  This  work  has  resulted  m  a  -> -tem  that  can  ex¬ 
tract  cultural  objects  from  aenal  imagery;  it  employs  geomet¬ 
ric  reasoning  to  identify  semantically  significant  arrangement  * 
of  straight  line  segments  in  the  bonier*  of  the  supplied  parti¬ 
tion.  Emphasis  is  placed  on  using  generic- model*  eharacten/:ng 
significant  kinds  of  geometric  relationships  and  shape*,  thereby 
avoiding  the  well-known  drawl>ark*  inherent  m  tin*  use  of  spe- 
cific  object  templates.  An  important  feature  of  tin  •  qurm  (still 
under  development)  is  tlie  generation  of  an  explanation  for  any 
detected  discrepancy  between  the  hy  |>o(lie*iird  object  model* 
and  the  initial  partition.  In  principle,  tlii*  technique  should  i>er- 
mit  intelligent  compensation  for  anomalies  due  to  imaging  or 
environmental  effects  that  would  be  recognised  by  a  well-briefed 
human  analyst;  for  example,  the  system  should  be  able  to  iden¬ 
tify  two  contrasting  regions  of  a  peaked  roof  a*  belonging  to  a 
single  house  based  on  illumination  effect*  consistent  with  the 
known  sun  pos.tion  The  ability  of  tins  system  to  explain  its  de¬ 
cisions  in  trrmt  of  deviation*  of  sensed  data  from  stored  model* 
appears  to  offer  an  effective  mechanism  foe  understanding  the 
operation  of  the  system  and.  simultaneously,  a  basis  for  improv¬ 
ing  its  performance. 


4  INTERACTIVE  SCENE  MODELING 
AND  KNOWLEDGE-BASE 
CONSTRUCTION 


Our  intent  in  this  effort  i«  to  develop  a  *>*tei»i  framework 
for  allowing  higher-level  knowledge  to  guide  the  detailed  inter¬ 
pretation  of  imaged  data  by  autonomous  scene  xnaly  sj*  tech¬ 
nique*.  Such  an  approach  allow*  symbolic  knowledge,  provided 
by  lu^h«*r-lrv el  knowledge  sources  to  control  automatically  the 
selection  of  appropriate  algorithms,  adjust  their  parameter*,  and 
apply  them  in  the  relevant  portions  of  the  image.  More  sig¬ 
nificantly,  we  aie  attempting  to  provide  an  --Hi « i#nt  mean*  foe 
supplying  and  using  qualitative  knowledge  about  the  semantic 
and  physical  structure  of  a  *rrnr  *o  that  the  mar  Imi'-produced 
interpretation,  constrained  by  this  knowledge,  will  he  consistent 
with  what  is  generally  true  of  the  overall  scene  structure,  rather 
than  just  a  good  fit  to  locally  applied  model- 

An  important  component  of  our  approach  i*  to  design  a  means 
for  a  human  operator  simply  and  effectively  to  provide  the  ma¬ 
chine  with  a  qualitative  scene  description  in  the  form  of  a  seman¬ 
tically  labeled  3-1)  ‘sketch.’*  This  capability  for  effective  com¬ 
munication  between  a  human  and  a  machine  about  the  three- 
dimensional  world  requires  both  appropriate  graphics  tools  and 
an  ability  on  the  part  of  the  machine  for  both  spatial  reasoning 
and  some  semantic  ‘understanding.*  The  importance  of  '.bis 
work  derives  from  the  fart  that  a  major  difficulty  in  automat¬ 
ing  the  image-interpretation  process  is  the  inability  of  current 


o 


42 


n>ni|>ii»rr  -v-irrr.-  lotlrtbic..  from  ihr  visible  i.VARt  coalfot,  lb* 
?rnrral  rmurxi  of  thr  -rrnr  (<•  R..  urb»n  or  nml;  vuoo  of  it* 
vrar.  mIiaI  happened  immediately  before.  and  whal  mill  h appro 
itnim-diaii-ly  a'irr.  tlir  imaRr  mas  viewed  by  tbe  srn«or)  --  tbe 
knom  l.dy-ba-e  ami  na-oninR  required  for  such  an  ability  is  mrcil 
I mid  m  hat  I  lie  *i  air  of  our  art  can  hope  lo  areompli-h  oxer  (at 
Ir.-i-i  |  ilii-  oral  'i  irjpi  Tim-,  our  work  is  intended  to  provide 
a  mean-  la  which  a  biiiitan  ran  supply,  to  a  task-oriented  pro¬ 
gram  tin-  lir.di-lcvvl  ovrwrw  the  proRram  needs  for  its  analysis 
of  a  -'pi'ii  -ri-nr.  1ml  r annul  acquire  by  itself. 

4.1  The  Representation  and  Modeling  of  Natural 
Forms 

Dnr  rr-i  arrb  in  tin-  ana  addresses  three  related  problems: 
(It  npre-iiuinr  natural  -liapr-  -orh  a*  mounsarus.  veRetstion. 
and  rlinnl-:  Ml  rompiniiiR  snrh  desrnptioos  from  imaRe  data; 
and  l  t|  inii-rarim  ly  prutidin;  ihe  machine  with  a  Hesrrtptioo 
if  naiiir.ll  r  am-  a-  a  a  ay  uf  l.mldinR  an  internal  knom  IcrlRe  data 

i , .m-  |  |lr  hr-i  -li  p  ii.aard-  -ilnnR  these  problems  t»  lo  obtain 

a  lit# •  1 1«  I  *-f  nalural  -urfaci-  -liapes. 

\  uiudil  «.f  nainr.il  -mfares  is  extremely  important  because 
'a  r  ’-re  pr-  Ml  m-  ilia'  -arm  impossible  lo  address  mith  stan- 
-lard  dr-rript is r  coinpiiirr-v t-n>n  techniques.  Horn,  fur  instance, 
•bi-i-id  me  di -cribe  the  -liapr  of  leaves  on  a  tree'’  Or  Rrass?  Or 

cl . I-  '  \\  lirii  »•-  aiiimpi  lo  -lescnbr  such  common,  natural 

-ll-ip.  -  u--n;'  -lan-lard  ri-pn -rntaiions.  the  result  is  an  unreal# 
i-iically  c.aiiplirati  d  :m  <1. 1  l  urthcrmofe.  horn  ran  me  extrarl 
;.|l  mf  rn-.il i« >n  from  tin  im.ifr  of  a  textured  surface  mhen  me 
|i.i\r  no  .  If.  i  im-  ino-l- 1-  that  di  -cribe  nalural  -urftirr.  ami  horn 
they  i-v i# li ■  i ii" •-  ilu  in-  In-  in  iln-  imaRe’  The  lack  of  -uch  a  T-l) 
■nodi  I  b s-  r.  -in.  ini  i in. --I-  i.Aiuce  descriptions  lo  licinR  *#l  hoc 
- 1 .it i- 1 ii  .il  no-. .-urc-  of  iln-  -iiiafr  intensity  surface 

Irari.d  ft  1 1  ii*  1 1<  ai-.  a  mail  <  la—  of  naturally  art-ifiR  functions. 

art-  a  ■: . I  t  h# -mi-  for  mi... I-  line  nalural  surfaces  because  many 

l.a-ic  plo  -n  .-1  |  n.ri— (r  *  .  riaion  and  aRRrrrat.nnl  pnwl’ire 

a  fi  n  i.il  -urfari-  -liapr  and  laransc  fractals  arc  a  nlrly  us-d 
a-  a  -r  11*111*  tm-l  for  -i  m  r  u  in-.-  naliiral-lonkinR  -lia,»cs  \ddi- 
1 1- -ii-illy  hi  t  survey  .  f  n  •  tuial  imagery.  if  found  I li^-l  *  fractal 
nit>>}<  I  t  f  Hi.  . I  M )  w urf  furnishes  in  x^fura'c  description 
..f  l.oih  it  viMritl  iii*l  w |i.«i |« •  |  ira-agc  regions.  tliu«  providing  »»li* 
< I#! 1 1* >ii  "f  iln-  |>li\  mi,  m.,|,  r»v » «l  model  fur  lx>th  image  texture  and 
'hadim' 

I'ro-'ri-  r*  |.  \  tnl  in  Miff  1-1)  information  from  imared 

.l  it .«  . . f  :i  fractal  m*  *M  f'  described  in  lYutlarid  [**|  N 

}i  ,-|m  I  .-tii  d*riv»d  *n  >t  it  f  mm.-  »lirl|irr  or  not  the  fractal  model 
i a.  f* *r  a  ji.in i»  nl.t»  ‘<1  *f  imare  data  an  empirical  rtitlnil 

f..r  t  < >f«i |  ii* it i  -  surface  r'’iirtliin  »k  from  image  rbila  ha*  been  dr- 
,J  ;h.  .•  ft.  ft  i  e  i<  ti  *f  .1  1*1)  fractal -based  representation 
from  jr 1 1 1 .« I  im  i  *»•  !.f r .i  hi'  l  t*  n  demonstrated  dii*l  'iilMan* 
dal  progress  ha*  Itcen  ma<lc  in  thr  arras  of  sha|»e-from-lexture 
.irt’l  texture  segmental n»t».  <  liarar*eri/ation  of  image  texture 
l.y  mi  rtiim  . .f  fractal  surf-ire  model  has  also  shed  con  nlerahle 
h  'lit  ..ii  i In*  physical  I’.t^i'x  for  M  \rral  of  rhr  irxtnrr*ji.iri iftmntir 
irt  li:ii<|ii.  '  riirn  titlv  in  n**r  an<l  has  ma«lr  it  jx>**>tM**  to  »lrvnlir 
iiii.r'p  I'Mun-  in  a  tnannrr  dial  is  *ial>lr  ovrr  trait' format n»ns 
«»f  and  Imrar  trati'fMrms  *»f  inirni‘ity 

In  'll]  iViillantl  flrsrriS»rs  m  ititrrartivr  »)'.t«*m  for 

ii.  it’ll. >i  f.-rnt'  I  lii"  rmplovrs  *’ii|irn|ita<lrif’s.  a*  »rll  as 


fractaJ  functions,  in  alloain;  thr  usrr  simply  and  rfTmivriy  to 
erratr  a.id  display  almost  any  iconic  object  (r.5..  thr  human 
form,  surface*  with  anajytk.  descriptions,  natural  terrain,  etc  ). 

This  research  is  expected  to  contribute  to  the  development 
of  (1)  a  roinputatiooal  theory  of  vision  applicable  to  natural 
surface  shapes,  {2)  compact  representations  of  shape  useful  for 
describing  natural  surfaces,  and  (3)  rr*J*time  modeling,  fener¬ 
ation,  and  display  of  natural  scenes.  We  al-o  anticipate  adding 
significantly  to  our  understanding  of  the  way  humans  perceive 
natural  scenes. 

4.2  Interactive  Modeling  and  Analysis  via 
Machine  Synthesised  Imagery 

Terrain-*  *alc,  described  in  Quana  [12],  is  a  system  for  synthe¬ 
sizing  realistic  sequences  of  perspective  stereo  views  of  real*  world 
terrain  (described  within  the  machine  by  a  database  of  geomet¬ 
ric  anil  pliMomrlric  modcU).  This  system,  implemented  on  a 
Sjmlxdics  .V.00  Lisp  Machine,  ha*  a  sophisticated  graphical  in- 
terfacr.  winch  allows  the  ir*«r  to  specify  an  arbitrary  fhght  path 
c.\ er  a  rn  wh  led  piece  of  terrain.  A  sequence  of  wcw-  (-mgle  im¬ 
ages  or  -terro  pairs,  aa  d«-sifrd).  spared  at  itjual  di-ianees  along 
the  flight  path,  is  generated  a*  alx>ut  one  frame  per  minute,  and 
up  to  U»  frames  can  be  displayed  at  a  rate  of  sixteen  frames  per 
second.  This  system  is  revolutionary  111  ,ts  flexibility,  computa¬ 
tional  efficiency,  and  the  quality  of  the  renderings  its  produces, 
given  that  it  Hors  not  employ  any  special- purpose  hardware. 

4.3  AichitKturct  for  Interactive  and  Real-Time 
Machine- Vision  Systems 

The  computational  demands  imposed  l»v  nii*’i«cii«e  and  real- 
time,  maelune-v isx»n  application*  fn-»|i!i  ntly  rxc»*e«l  the  capac¬ 
ity  of  eor vent Mxial  computer  arrlutiTtiirr*.  lor  this  reason, 
attempts  have  l»ecii  ma«le  to  reduce  computation  time  hv  He* 
romposi»«g  srrial  iilg«»rti hms  into  Moment*  tie*1  «.ui  !«•  -umilta- 
neoii-lv  exrrutcd  on  parallel  hardware  arc  I  nl  ec  lores  liecaitM* 
many  classy  ».f  algorithm-  not  r*a*hlv  «|ecomi»<is<  one  seeL- 
some  •  *•  her  La*  o  for  par.lhh'MI  (ll  I  i  *  hler  all  I  I  if  .  it  ill  [ij 
We  -IimW  I  I  )  lli-tl  - *11*"  I  lie  ail'Wrf  Im  a  pf  ’Ll*  i*t  and  llie*l 

rhi'din;  its  validity  is  a  usefi»|  appro**  Ii  »n*l  t  )  1I1  ti  #i  umii- 
I  i«t  *f  v  i«ioi|  al-*»ril  Ions  are  ha»«s|  on  till*  r  ■  *»e  •  pi  V  I’fallrl 
»r<  I  * « 1  •  ci  urc  capable  *  *f  e\<eutmg  '«ic|i  al  #.iti|**«»*  1-  (•» 

5  ackno\vu:d(;emknt 

Ihe  f. -Mow  to"  research*  r>  li.tv«‘  emit  mImMi  •!  »*i  *h*-  work  de- 
*  c  ri  I  a  •  I  in  tfii'  report  II  I  ’  k  *  r  '  I' nun  !  !  1  r-.tlle- 

*»  I  trst  hem.  \  I  isrhli  r.  I "  I  ua  M  I  ll.nn-  '  t  II.oi-mi 
I)  L  hash  tan  h  Laws,  \.i*.  IVniUml.  I.  11.  tjuam.  *•  It  "‘luuli 

T  ^trat.  ami  II  <  NNnlf 


-  >  -'**  i;  s_'^ 


*  -sr*  r*  ^  > jtt v*stf  *r»a  *%  «*  rxr~% 


References 

|]|  Rarrard.  S.T.,  “An  Inductive  Approach  to  Fig'—'*1  Percep¬ 
tion."*  AIC  Tvchnic.il  Note  325,  SRI  ^tvmatiooai,  Menlo 
Park,  (  alifomia  {September  1081). 

|j|  Relies.  R.C.  and  H.H.  Baker,  ‘Fpipolar*  Plane  Image  Anal* 
ysis:  A  Technique  for  Analyzing  Motion  Sequences,*  to 
appear  in  the  Third  Iniernationai  Symposium  o#i  Robotics 
Research.  Paris.  France  (October  1085);  also  these  proceed¬ 
ing*. 

[3j  Fivltler.  M.A.  and  R.C.  Holies  ‘Perceptual  Organization 
and  Curve  Partitioning.*  Proceedings  of  the  I9A3  Image 
i'ndcf standing  Workshop  (June  1983);  also  IEEE  C\  PR- 
83. 

(d)  Fischler.  M.A.  and  O.  Firseh*dn,  ‘Parallel  Guessing:  A 
Sir.  legy  for  High-Speed  Computation,"  AIC  Technical 
Notv  338.  SRI  International.  Menlo  Park,  California 
(September  1981). 

[5]  Fit  a.  P.V  .  and  A.J.  Hanson,  "(locating  Cultural  Regions  in 
Arrial  Imagery  Psing  Geometric  Cura,*  tliese  proceedings. 

[ti]  Hami alt.  M .  J. .  ‘Evaluation  of  STERKOSYS  vs.  Other 
Stereo  System*"  an<i  "The  Stereo  Challenge  Data  Base," 
AIC  Technical  Notev  .3f»5  and  3f/i,  respectively.  SRI  Inter- 
national,  Menlo  Park,  California  (Ortob"<  1985). 

[7)  Hannah,  M.J.,  ’SPP*  Baseline  Stereo  System,*  these  pro¬ 
ceeding*. 

(8)  bans,  K.I..  ’Goal- Directed  Texturrd- Image  Segmenta¬ 

tion."  Technical  Note  331.  SRI  International  (September 
1981).  O 

[9|  IVnfUnd.  A.P..  "Frartal  HaM-d  Description  of  Natural 
Semes.*  Proceeding »  of  thr  I9S3  Image  Inderstanding 
Workshop  ( June  19x3).  *Imj  IEEE  t  •  VPR-83. 

|K)J  Pent  land.  A  P  .  "  \  Nr*  Senv  for  Depth  of  FHd,"  Proceed¬ 
ing*  of  international  Joint  Conference  on  Artificial  Intel • 
I. genre.  Lo*  Angrle*,  Caltfonua  ( August  1985). 

[II]  iVntland.  A. I*..  ‘IVrrrptual  Organization  and  the  Repre¬ 
sentation  of  Natural  Form,-  AIC  Technical  Note  357,  SRI 
Irttennifinrial.  M«  nlo  Tark.  California  (July  1985). 

||  jj  t^uarn.  I,  II  ,  *3" lie  Tcrrain-<  ale  Swrm,"*  the^e  (>roc-cd- 
ingv 

||.3|  Smith.  C.H..  “Stereo  Rrmn'f  ruction  of  Serne  Depth  ‘  Pm- 
eroding »  of  Computer  l  uron  »»n<J  Pattern  Recognition  \?5, 
San  Francisco,  California  (June  19-23.  *985). 

(I  Ij  Strat,  T  M.  and  M.A.  Fischler,  ‘Oiie-E>ed  Stereo:  A  Gen¬ 
eral  Approach  to  Modeling  .3-D  Seme  Geometry,*  Proceed- 
mgs  of  International  Joint  C onfcrrnce  on  Artificial  Intel¬ 
ligence,  l,o*  Angeles,  California  (August  1985);  also  these 
pmrmling*. 


r 


44 


L V J  .*.>■  r XT - ' J.  rjygyg  »l.^.vxjrw*ma.,wn.'wxrn.wxrax-wA.-v)JVA.--:c 


GC970N5  08 


S# 


Image  Understanding  Research 
at  Columbia 

John  R.  Render1 

Department  of  Computer  Science 
Columbia  University,  New  York,  NY  10027 


0  Abstract 

The  Computer  Vision  Laboratory  at  Columbia  University 
continues  to  focus  on  problems  of  middle- level  vision.  Our 
recent  emphasis  has  been  on  the  complexity  of  image 
understanding  tasks,  and  on  the  implementation  of  middle- 
level  algorithms  on  parallel  processors.  We  have  derived 
several  neve  algorithms  for  shape-from  methods,  including  a 
provably  convergent  algorithm  for  shape-from-shading  tad  an 
efficient  algorithm  for  smoothing  the  optic  flow  Held.  We 
have  demonstrated  that  the  depth  interpolation  problem  is 
critically  dependent  on  definitions  of  "smooihnese’  nod  hsve 
derived  at  least  four  major  classes  of  algonC  ms  for  its 
solution  in  the  context  of  computer  vision.  Two  A  these  we 
have  critically  compared.  We  have  begun  to  explore 
algorithms  on  SIMD  machines  that  are  computationally  and 
commumeationally  efficient  for  the  derivation  of  sparse 
depth  from  stereo,  sad  for  the  derivation  of  full  depth  from 
sparse  depth. 

In  work  on  high  level-vision,  w't  have  demonstrated  some 
theoretic  deficiencies  in  the  generalised  cylinder  approach  to 
object  modelling.  Additional  work  on  middle-  and  nigh-level 
vision  progresses-  on  texture  and  texture  gradients,  oa 
object  modelling  and  qualitative  shape  description,  ana  oa 
natural  language  correlates  to  qualitative  aspects  of  spatial 
relations. 


1  Introduction 

The  Computer  Vision  Laboratory  at  Columbia  continues  its 
sleidy  growth.  A  jceond  professor  of  computer  science 
(Piter  Alien)  and  a  research  associate  (Hussein  Ibrahim)  now 
share  is  directing  the  eork.  Faculty,  staff,  and  students 
have  reached  13  people.  Oe*  VAX  has  been  augmented 
with  three  Sun  workstations,  and  our  Puma  SAO  robotie  arm 
and  our  Matrix  color  film  recorder  are  both  up  an-1  running. 
We  have  graduated  another  Ph  D.  student  (David  Lee),  and 
tan  reasonably  expect  at  least  one  other  in  the  coming  year. 

Our  research  invest'gstions  fall  into  the  following 

categories. 

-  The  analysis  of  the  complexity  of  middle-level 
vision  algorithms,  including  depth  approximation, 
shading,  and  optic  flow,  both  abstractly  via 
mathematics  and  experimentally  via  psychology 
(David  Lee  and  Terry  Boult). 

-  The  parallelisation  cn  mesh-  and  trre-connectcd 
SIMD  machines  of  middle-  t,  d  high-level  vision 
algorithms,  including  stereo,  depth  approximation, 
and  model  matching  using  extended  Gaussian 
images  (Hussein  Ibrahim  and  Doug  Choi). 


TXix  work  *H  axpporl-d  ;B  B ,  ike  r,' -•  «e  AS-ue-4  H*— w-k  P-i/wll 
roBlwl  !  0003t-f 4-C-0lft4,  ic4  im  ptrt  h/  u  N>F  I  rvatdialiM 
^  ouo|  Award 


•  The  investigation  of  shape-from-texture 
algorithms,  including  the  analysis  of  textures 
under  illumination  and  viewpoint  changes,  and  the 
exploitation  and  coordination  of  multiple  lexturs 
knowledge  sources  (Mark  Moerdler,  Paul  Douglas, 
and  George  Wolberg). 

-  The  analysis  and  implementation  of  existing 
techniques  for  quantitative  and  qualitative  three- 
dimensional  shape  analysis,  with  special  attention 
to  aspect  graphs  and  generalised  cylinders  (David 
Freude  stein,  Ken  Roberts,  and  Ari  Gross). 

•  The  gene  ration  of  natural  language  text  from  a 
representation  of  spatial  relationships, 
concentrating  on  the  semantics  of  words  denoting 
relative  and  absolute  scaled  quantities  (Micbal 
Blumcns.'yk). 

•  Usual  system  support  activities,  including  tbe 
investigation  of  efficient  half-toning  techniques  for 
various  bard  copy  device*  (earl  Smith,  laboratory 
manager,  and  Dina  Berkowitt,  staff). 

2  Complexity  of  linage  Algorithms 

Using  tbe  methods  of  both  numerical  analysis  and 
information  based  complexity,  we  have  attempted  to  quantify 
the  difficulty  of  middle-level  vision  tasks  and  to  provide 
optimal  algorithms  for  their  solution.  We  huve  explored 
several  such  problems,  including  depth  interpolation,  shape 
from  shading,  and  the  smoothing  of  the  optic  flow  Field. 
Further,  wc  have  investigated  several  issues  related  to 
optimality,  including:  the  precis*  definition  of  a  “smooth" 
solution  to  depth,  shading,  and  flow;  the  human 
psychophysical  perceptibility  of  “smoothness";  the 
computational  efficiency  of  our  algorithms  relative  to  existing 
methods,  and  tbe  information  gathering  circumstances  under 
which  any  of  the  several  "smooth"  solutions  is  tbe  most 
appropriate. 

2.1  Depth,  Shading,  and  Optic  Flow 

W*  have  demo-'slrated  thst  the  interpolation  or 
approximation  of  full  depth  values  from  the  sparse  depth 
values  derived  from  stereo  or  other  means  is  optimally 
solved  (in  ibe  worst  case)  by  splines  [9,  M|  Depending  on 
tbe  imaging  situation,  this  implies  that  the  full  depth  map 
can  be  obtain'd  in  time  only  linear  in  the  data.  Further, 
we  showed  that  under  many  imaging  conditions,  heuristic 
adaptation  cannot  affect  the  speed  or  precision  of  the  result' 
the  computation  can  be  decision-free,  with  no  loss  of 
accuracy  We  also  noted  that  many  of  the  algorithmic 
implementations  of  this  abstract  result  would  be  especially 
blessed  with  scperabilities  and  symmetries,  and  so  would  be 
well  suited  for  hosting  on  a  SIMD  machine:  a  conclusion  we 
are  vigorously  pursuing  In  practice  (see  Section  3). 


.  - .  ■ . - 


45 


7T  v.  »7»7»7»7*7»7rf*"  i1.1  -  r 


* 


This  work  highlighted  the  disturbing  lick  cf  justification 
for  the  invocation  of  an;  particular  class  of  functions  as 
being  a  useful  model  for  real  world  objects.  The 
mathematical  approach  we  used  was  general  .oough  so  that 
the  conclusion  regarding  splines  was  valid  even  over  a  wide 
range  of  object  models,  but  the  definition  of  just  what  is 
bting  sought  remains  a  free  parameter  in  the  approach. 
Evidence  for  supporting  one's  real  world  assumptions  of 
“smoothness”  must  be  gathered  extrinsical!/,  a  topic  which 
we  are  investigating  also. 

In  work  on  shape  from  shading,  we  have  reformulated  the 
mathematical  basis  for  the  relaxation  methods  of  Ikeucbi  and 
Horn.  This  has  lead  to  a  new  algorithm  whose  convergence, 
under  certain  general  circumstances,  is  provable  [14,  1 S). 
The  solution  again  is  cast  in  terms  of  smoothing  splines, 
under  a  straightforward  definition  of  "smooth"  surface. 

The  solution  is  unique,  iteratively  obtainable  and 
guaranteed  to  converge  if  the  reflectance  function  sufficiently 
models  the  surface  properties  and  if  the  image  intensity 
measurements  are  sufficiently  precise.  These  two 
“sufficiently"!  an  quantifiable;  they  appear  in  the  analysts 
as  limits  on  the  penalty  parameter  lambda  used  to  weight 
the  relative  significance  of  reconstructed  image  accuracy 
versus  reconstructed  image  smoothness.  We  obtained  limits 
for  this  parameter  in  the  abstract;  in  general,  the  more 
variation  there  is  in  the  reflectance  map,  the  less  the 
accuracy  that  can  be  demanded  of  the  reconstruction  and 
still  have  convergence  guaranteed.  In  the  cate  of 
Lambertian  reflectance,  we  give  a  numeric  range. 

In  addition,  the  algorithm  has  very  attractive  complexity. 
Because  it  is  dominated  by  the  multiplication  of  a  vector  by 
a  sparse  matrix  with  special  properties,  the  fist  Fourier 
transform  can  be  used  instead  of  the  conventional  method. 
To  obtain  an  image  reconstruction  with  accuracy  to  0(1/N) 
(within  the  range  allowable),  the  algorithm  runs  in 
0(N5logJN)  steps,  where  Nx.N  is  the  number  of  pixels.  (We 
have  yet  to  test  the  algorithm  on  real  images.) 

In  work  on  smoothing  the  optie  flow  field,  we  applied 
similar  techniques  to  the  methods  of  Horn  and  Schnnck,  and 
of  Cornelius  and  Kanade  |IS|  Again  using  smoothing 
splines,  we  ran  shew  that  within  the  unit  square,  the  matrix 
central  to  the  method  is  symmetric  positive  definite  We 
were  able  to  bound  its  minimum  and  maximum  eigenvalues, 
which  suggests  thsi  the  Chebyshev  iterative  method  of 
solution  would  be  most  appropriate  and  quick  to  converge. 
The  proposed  algorithm  we  have  shown  must  a  unique 
solution,  ulmougb  we  are  as  yet  unable  to  guarantee 
convergence.  Like  our  work  on  shape  from  shading,  the 
algorithm  employs  the  FFT  id  lieu  of  the  standard  method 
for  multiplying  a  vector  by  the  matrix,  so  it  too  has  low 
computational  complexity. 

Future  work  in  this  area,  including  implementation,  b 
detailed  in  |13|. 


2.2  Smoothness  and  Efficiency 

We  have  investigated  the  ubiquitous  problem  of  image 
property  smoothness  In  much  of  the  work  on  middle-level 
vision,  it  is  necessary  to  define  the  desired  "smoothness"  for 
the  solution  manifold:  whether  for  an  object  surface  or  for  a 
retinal  vector  field.  However,  it  seems  that  current 
definitions  usually  have  little  a  priori  basis;  often  it  appears 
that  what  is  assumed  is  what  is  eoncenaent  to  assume, 
rather  than  what  ought  to  be  assumed. 

One  justification  for  a  definition  of  smoothness  is  that  it 
corresponds  to  the  apparent  limits  of  human  beings  to 
detect  changes  in  higher  order  derivatives  We  investigated 
what  is  known  about  these  limits  jlj,  but  there  is 
sumrismglv  little  in  the  literature,  and  most  of  it  is  on 
smooth  motion  perception  Nevertheless,  we  showed  that 
psychologists  who  posit  formal  models  of  the  human  vision 
system  are  forced  to  assume-usually  implici'.ly-specific 
limits.  But  they  do  not  agree,  and  they  invoke  a  wide 
range  of  differentiabilities  for  "smooth"  objects  in  the  world, 
for  the  “smooth"  images  they  cause,  and  for  the  “smooth' 


motions  they  trace  We  demonstrated  that  this 

disagreement  is  reflected  in  our  own  field,  again,  a  wide 
range  cf  differentiability  is  med  in  work  on  depth 
reconstruction,  si  ape  from  shadia,  and  optic  flo»  . 

In  work  on  surface  fitting,  we  have  stressed  that 
“stnoetbeess”  is  cot  simply  a  matter  of  semantics;  the 
definition  usually  has  a  profound  eilect  on  the  quality, 
accuracy,  and  complexity  of  the  algorithms  necessary  to 
achieve  it.  We  have  shown  that  it  b  meaningful  to  say 
“smooth"  in  at  least  four  fundamentally  different  ways  [2]. 
Each  of  these  ways  invokes  different  assumptions:  tor 

example,  band-limiting  the  surface,  having  smoothness  be 
con- isotropic,  incorporating  a  priori  knowledge,  etc.  We 
have  discussed  the  often  overlooked  difference  between 
requiring  surfaces  to  interpolate  given  depth  date  versus 
merely  to  approximate  it-  more  is  known  about  the 
complexity  of  tne  former.  We  have  illustrated  that  any  of 
these  definitions  can  realised  in  one  of  four  basic  ways:  by 
classical  minimisation,  by  multigiid  methods,  by  two  types 
of  reproducing  kernel  splines.  we  detailed  the  advantages 
and  disadvantages  of  each  realisation,  and  gave  guidelines  on 
bow  to  choose  among  them  under  different  imaging 
situations. 

In  related  work,  we  have  constructed  n  reference  catalog 
for  the  reproducing  kernels  reeded  in  the  spline  approach 

I3!  .  .  it  we  gave  implementation  details  and  some 

examples  for  the  algorithms  that  arise  under  the  four 
definitions  of  "smooth"  already  referred  to,  as  well  as  under 
four  additional  definitions 

Lastly  we  have  compared  Crimson's  gradient-projection 
approach  for  depth  interpolation  to  two  variants  of  our 
method  based  on  splines  (4J.  We  derived  order  estimates  for 
all  three  algorithms'  see;.!  time  complexity,  serial  space 
complexity,  and  parallel  time  complexity  assuming  a  SlMD 
ctseLine.  We  pointed  out  that  the  spline  methods  always 
calculate  a  unique  solution  surface,  but  that  the  gradient- 
projec.ion  may  not  converge.  Further,  w»  indicated  other 
advantages  cf  spline  methods  They  product  s  function  that 
can  give,  along  with  the  dense  depth  values,  valuable  surface 
properties  such  as  the  gradient  or  local  “smoothness".  Wt 
found  that  splint  methods  srt  locally  calculable,  isotropic, 
and  can  serve  as  compact  surface  descriptors.  Wc  also 
shewed  tb - 1,  unlike  the  gradient-projection  method,  they  are 
eaidy  extensible  to  many  definitions  of  "smooth". 


3  Parallelization  of  Image  Algorithms 

We  hsvt  continued  to  snalyit  snd  _  encode  vbion 
algorithms  to  b<  hosted  by  Columbis's  fine-grained, 
tree-  and  mesh-connected.  SlMD  parallel  procesor,  Non-Voo. 
A  prototype  of  &3  nodes  has  been  running  sice*  early  1985, 
when  we  demonstrated  the  segmentation  of  res'  image  data 
using  quad  trees.  Most  of  our  work  continues  in  the 
abstract.  with  slgorithms  verified  by  a  'unctiontd  simulator. 
We  have  examined  algorithms  at  low-  and  middle-level,  and 
have  begun  work  on  a  high-level  image  analysts  task. 


3.1  Low  and  Midd'e  Level  Tasks 

We  have  demonstrated  that  Nou-Von  b  a  cost-effective 
processor  for  low  level  vision  tasks  |7|.  We  have  desigued 
and  tested  algorithms  for  binary  image  tree  and  quad  tree 
creation  and  manipulation.  for  image  correlation, 
hiMogrammg.  and  connected  component  labeling,  snd  for 
geometric  property  computations  such  as  moments, 
compactness,  and  Euler  number.  The  encoding  of  the 
algorithms  incorporated  novel  approaches  to  control  flow 
that  reduced  the  effects  of  the  communication  bottleneck 
usually  associated  with  tree  architectures.  We  have  recently 
begun  to  design  and  test  algorithms  for  stereo. 

At  the  middle  level,  we  have  described,  simulated,  and 
analyzed  the  performance  of  algorithms  for  a  representative 
Hough  transform,  and  for  an  algorithm  that  is  used  in  the 
interpretation  ol  moving  light  displays  [8]  We  concluded 
that  even  in  middle  level  vision  it  is  possible  to  exploit  the 


;  'T? 


T 


ttttvtt  rr»Trr»T 


.  -  -'  " ’-'A -  . 


available  maraive  parallelism  of  a  SIMD  machine.  We 
showed  tbai  by  caref  iliy  and  inexpensively  duplicating  data 
and/or  control  infoixation,  and  by  delayiug  or  avoiding  the 
reporting  of  intermediate  results  it  is  possible  to  avoid 
many  of  the  communication  bottlenecks  otherwise  common 
at  this  level  of  image  understanding. 


3.2  Middle  and  High  Level  Tasks 

Cmboldened  by  our  initial  success,  we  have  been  to 
investigate  the  design  and  execution  on  Non-Von  of  two 
image  understanding  tasks  at  higher  levels  still.  The  first  is 
the  depth  interpolation  problem,  whose  solution  via  the 
adaptive  Cbebysnev  acceleration  method  appears  to  be  a 
natural  fix.  to  the  tree-  and  mesh-connections  of  the  machine 

[6j.  We  have  simulated  several  small  interpolations,  and 
compared  our  speed  and  accuracy  with  existing  iterative 
methods  based  on  Gauss-Seidel.  We  have  begun  to 
investigate  the  performance  of  two  other  similar  methods: 
the  pure  Cbebysnev,  and  the  conjugate  gradient  method. 

The  second  task  is  still  preliminary,  but  it  is  probably 
rigbtly  called  a  high  level  vision  problem.  We  are  designing 
ways  in  which  Noo-Von  can  parallelite  the  recognition  of 
objects  Irom  their  extended  Gaussian  images.  We  hope  to 
demonstrate  th»t  SIMD  architectures  can  be  used  in  model 
matching,  and  might  even  profit  from  the  use  of  heuristic 
search  in  their  control  structures. 


1  Analysis  of  Texture 

We  bare  begu?  an  exploration  of  the  properties  of  fractal 
textures  under  imaging  assumptions  which  are  more  realistic 
than  those  currently  >n  l  >e.  In  particular  we  are  examining 
the  effects  that  oblique  illumination  or  view  angles  have  on 
the  fractal  dimension  of  the  imaged  texture.  We  wish  to 

understand  these  effects  enough  to  invert  them,  and  to 

derive  from  them  constraints  on  surface  orientation  and 
distance. 

In  applied  work  on  texture,  wt  continue  to  exploit  and 
coordinate  multiple  shape-from-texture  knowledge  sources 

i!2j.  One  source  is  ba-ed  on  the  virtual  lines  (spacing]) 

at  occur  between  line-like  elements  |IOj.  We  have  shown 
that  it  is  quite  robust  in  the  presence  of  noise  and  texel 
perturbation;  it  is  even  capable  ol  handling  primitive  forms 
of  object  transparency  and  virtual  surfaces.  We  continue  to 
expand  the  scope  of  the  system,  and  hope  to  program  and 

integrate  a  module  that  exploits  the  constraints  implicit  in 

rravitationaily-based  environmental  labels  such  as 
"horizontal”  and  “vertical”  [llj. 


5  Analysis  of  Three-Dimensional  Shape 

As  part  of  our  initial  incursion  m:o  the  study  of  three- 
dimensional  representations  for  shape,  we  surveyed  systems 
that  explicitly  capture  and/or  internally  maintain  depth 
measurements  (S). 

Delvvig  further,  we  have  demonstrated  the  difficulty  of 
using  generalized  cylinders  as  a  canonical  representation  for 
object  models.  Budding  on  work  of  Shafer,  we  have  shown 
that  even  when  only  straight  homogeneous  generalited 
cylinders  are  considered  theorems  about  the  uniqueness  of 
descriptions  of  a  given  scape  are  difficult  to  prove  .  Regular 
polybedra  and  simple  egg-shaped  objects  are  counterexamples 
to  even  minor  attempted  extensions  to  an  existing  theorem 

m. 


6  Generating  Text  about  Spatial  Relations 

In  a  new  investigation,  we  have  begun  to  analyze  the 
representations  and  state  'nforination  'bat  are  required  for 
the  generr”  >n  of  uatursl  language  description  of  spatial 
relations.  In  particular,  wc  nave  noted  'hat  many 


rmrrr 


statements  regarding  measurable  physical  quantities  such  as 
length,  mass,  or  time  are  given  qualitatively  with  respec',  to 
an  assumed  focal  point  or  reference.  Thus,  trees  are 
“nearby",  or  houses  are  “:oo  big’’.  W*  hope  to  obtain 
sufficient  insight  into  the  presumed  topological 
representations  underlying  such  utterances  that  it  can 
contribute  to  our  efforts  on  three-dimensional  shape. 


References 

I.  Boult,  T  Smoothness  Anumpitcoj  ia  Human  «ad 
Machine  Van*,  god  Their  Implications  for  OpumaJ  Surface 
Interpolation.  Department  of  Computer  Seiran,  Columbia 
Iwrtnfty,  IBM. 

1  Boult,  T..  ud  Reader,  J.  Oa  Srrfaes  Reconstruction 
Using  Spans  Depth  Data.  Protudsp  of  the  ARP  A  Image 
Uaderstaamag  Workshop,  Dae.,  IBM.  (Thaos  proceedings ) 

Z,  Bosh,  T.  Reproducing  KmA  far  Yieual  Surface 
Interpolation.  Department  of  Compiler  Sense*,  Columbia 
Uamrarty,  IMS. 

4.  Boult,  T.  Vsnl  Surfacs  Interpolation:  A  Comparwon  of 
Two  Methods.  Proceedings  of  Us  ARPA  Image 
Understanding  Workshop.  Dsc^  IMS.  (Those  pcocssdtap,) 

\  Boult,  T.  A  Sump  of  Soms  Three- Dimensional  Vision 
Systemn.  SK3ART  Neuefsttsr,  April,  IMS,  pp.  M27. 

ft.  Choi.  D.  and  Render,  i.  SoHng  ths  Depth 
Interpolation  Problem  with  ths  Adapter*  Chebyibsu 
Actslsrateon  Method  on  a  Parallel  Computer.  Proceeding* 
of  the  ARPA  (nags  Understand**  Workshop.  Dot.,  IMS. 
(Those  proceedings.) 

T.  Ibrahim,  H-AJt.  Kender,  J.R.,  and  Shaw,  DR.  The 
Analysis  and  Performance  of  Two  Middle-Level  Vmou  Teaks 
on  a  Fine-Grained  SIMD  Tree  Machine.  Proceed  mgs  of  the 
Conference  oa  Computer  Varan  tad  Patten  Rscognrticu, 
IEEE  Computer  Sonet?,  Mas,  IMS,  pp.  t«2-U7.  (An 
expanded  t«nws  kaa  been  submitted  to  Computer  Ft  see  a, 
CrepAsrs,  wad  fwuspr  /Veresseag.) 

ft.  Ibrahim,  HAH,  Reader,  JR.,  and  Shaw,  DR.  •Low- 
Level  Imago  Laderutaadmg  Tasks  on  Fine-Gramsd  Tran- 
Structured  SIMD  Machines.*  /sanaef  */  Areiir/  sad 
DtsfrtbaSrd  Cemp  alt  ay  (Submitted  for  pahhentaon  IMS). 

ft.  Reader ,  JR,  Leu,  D .  tad  Boult,  T  Information- Baasd 
Complexity  Applied  to  Optical  Recovery  of  the  1-1/1  D 
Skcub.  Proceed  saga  of  tbe  Third  Workshop  oa  Computer 
Vision  Reprtareiateoa  aad  Coslroi,  I££S  Computer 
Society,  Oca  1Mb,  pp  U7-157 

10.  Reader.  J  R  .  end  MoerdUr.  M.  Surfer*  Oneatatioa 
aad  Segmeateteoe  from  Perspective  Views  cJ  PareJWi-Ua* 
Textures.  Depart  meat  of  Computer  Science,  Columbia 
Uarrsnity,  Jan  ,  IMA 

II.  Reader.  JR  " Environmental  Labelings  in  Low-Level 
Image  federates  ding/  Arft/tetef  /rTtlhfrmti  (In  revwaoa 
IMS) 

12.  Kender,  J  R  .  Aruftn af  htiUipscc  R/ecse  *h  Notes. 
Volsas  :  Shape  /rent  Texture,  i  rtoaa  Pablwhtag.  Ltd., 
Loadoa.  To  appear  IBM. 

12.  Lee,  D.  "OpumaJ  Algor.ihma  for  Image  Understanding 
:  Cunvat  Status  and  Future  Plana."  Jeeruaf  #/ 

Complexify  (To  appear  IMS) 

Id.  Lee,  D  Cou.nlahoo*  la  In  formation -httd 
CompUntf,  /map r  i  V.gerefaodiay,  cud  U>pt  CircuM 
Dtifn  Ph  D  Th  ^  Depart  meat  of  Computer  Senses, 
Coi«a.l/<s  Uaivtrmy;  Sept  IMS. 

IS.  Lee,  D  A  Prove bly  Convergent  Algorithm  ter  Shape 
from  Shading  Proceedings  of  the  ART  A  Image 
Understanding  Workshop.  Dee  ,  IBM.  (These  procctdiags.) 

1ft.  Roberts,  K  Equivalent  Descriptions  of  OaeraJited 
Cylinder*.  Proceedings  of  ;h«  Coofereace  oa  Computer 
Vaioa  and  Palters  Recognition.  IEEE  Compuur  Society, 

June,  IBM.  (Abo  these  proceedings.) 


k 


*v  ■>!•*> 
v.'-v-Ia 
.1 


47 


*  v\*’v  * 

k\  '  * 


SUMMARY  OP  PROGRESS  IN  IMAGE  UNDERSTANDING 
AT  THE  UNIVERSITY  OP  MASSACHUSETTS 

Edward  M.  Rise  man  and  Alien  R.  Hanson 

Computer  and  Information  Science  Department 
University  of  Massachusetts 
Amherst,  Massachusetts  01003 


ABSTRACT 

This  research  summary  documents  several  areas  of  re¬ 
search  at  the  University  of  Massachusetts  that  are  entirely 
or  partially  supported  under  the  DARPA  image  under¬ 
standing  program.  The  work  i*  divided  into  several  areas: 
motion  analysis,  low-level  and  intermediate-level  process¬ 
ing,  and  knowledge-based  processing  strategies.  Some  of 
this  work  is  documented  in  several  papers  in  these  pro¬ 
ceedings. 

L  MOTION  ANALYSIS 

Our  research  in  motion  analysis  continues  to  broadeu 
with  research  in  several  theoretical  and  experimental  areas. 

1.1.  EFFECTIVENESS  IN  RECOVERING 

TRANSLATIONAL  MOTION  PARAMETERS 

We  have  continued  the  analysis  of  algorithms  for  con¬ 
strained  sensor  motion  [LAW84].  In  particular  we  are  eval¬ 
uating  the  robustness,  accuracy,  and  efficiency  of  the  algo¬ 
rithm  for  recovering  translational  motion  parameters 
[PAV85|.  Here  the  global  eearch  for  the  focus-of-expansion 
(FOE)  require#  the  computation  of  tho  sum  of  error*  (e.g., 
via  correlation)  associated  with  the  displacement  of  a  set 
of  feature  point*  in  two  or  more  frame*.  A  sparse  sampling 
of  the  possible  location  of  FOE*  provides  a  global  error 
function  whoee  minimum  localises  the  direction  of  motion. 

The  accuracy  and  robustness  is  a  function  of  the  num¬ 
ber  of  point*  that  are  tracked  and  contribute  to  tb*  er¬ 
ror  function,  which  of  course  must  be  traded  off  against 
the  amount  of  computation  that  can  be  tolerated  for  real¬ 
time  motion  analysis.  Thus  far,  our  experiments  on  sim¬ 
ulated  environment*  imply  that  there  is  a  wide  range  of 
situations  for  which  the  motion  parameters  can  be  approx¬ 
imately  recovered  at  relatively  modest  computational  ex¬ 
pense.  Specifically,  when  the  angle  between  the  image  plane 
and  the  direction  of  translational  motion  is  less  than  60 
degrees,  then  between  4  and  16  point*  which  are  widely 
spaced  in  the  image  are  sufficient  to  recover  the  approxi¬ 
mate  motion  of  the  sensor.  A  smaller  number  of  point*  (4-8 
points)  is  necessary  when  the  camera  is  oriented  appioxi- 

Thi*  work  vu  supported  by  DARPA  voder  Contract 

N009144MC-04M. 


mutely  in  the  direction  of  motion,  and  a  larger  number  of 
points  (8-16)  when  the  camera  orientation  is  at  a  modest 
angle  (IS  degrees  to  45  degrees)  with  respect  to  translation. 
When  the  angle  between  camera  orientation  and  translation 
is  large  (60  degrees  to  00  degrees)  there  appears  to  be  a 
flat  error  surface  around  the  correct  motion,  leaving  a  wide 
range  of  ambiguity  no  matter  bow  many  feature  points  are 
employed.  This  result  not  surprising  in  that  it  states  that 
when  a  camera  is  pointing  out  the  driver's  side  window,  ac¬ 
curately  determining  the  motion  of  a  vehicle  moving  down 
the  road  is  not  possible. 

1.2.  INHERENT  AMBIGUITY  IN  MOTION  ANALYSIS 

OF  NOISY  FLOW  FIELbd 

In  the  cases  where  the  sensor  motion  is  unconstrained 
and/or  there  are  independently  moving  object*  m  the  envi¬ 
ronment,  our  algorithm*  for  direct  recovery  of  motion  pa¬ 
rameters  and  environmental  structure  are  not  applicable. 
Therefore,  we  turn  to  the  usual  method  of  motion  analy¬ 
sis  which  is  decomposed  into  two  phases:  computation  of 
an  optical  flow  field  and  interpretation  if  this  fluid.  In  the 
pres*=r4'  discussion,  the  term  "optical  Sow  field*  refers  to 
a  "velocity  field",  compoeed  of  vectors  describing  the  in- 
■tantanuous  velocity  of  image  elements.  Tne  computation 
of  reliable  flow  fieM*  is  the  subject  of  work  presented  in 
Section  13.  The  second  phase,  which  is  the  general  inter- 
pi  elation  of  flow  fields,  was  the  subject  of  previous  work  by 
Adiv  (ADI85a,bJ. 

The  work  discussed  in  this  Sbction  mathematically  ex¬ 
amines  the  robustness  of  algorithms  for  interpreting  general 
motion  from  flow  fields.  The  analysis  focusses  on  ambigu¬ 
ities  that  art  inherent  in  the  sense  that  they  are  true  of 
all  algorithms,  and  can  only  be  resolved  if  constraining  as¬ 
sumptions  or  other  sources  of  visual  information  are  em¬ 
ployed. 

Two  problems  which  may  arise  due  to  the  presence  of 
noise  in  the  flow  tieH  have  been  examined.  Since  noise  In 
flow  fields  must  be  expected  almost  always  to  be  present, 
we  believe  this  analysis  is  relevant  to  all  real  situations  of 
motion  interpretation. 

The  Srst  ambiguity  it  in  recovering  the  motion  param¬ 
eters  from  a  noisy  flow  field  generated  by  a  rigid  motion. 
Motion  parameters  of  the  sensor  or  a  rigidly  moving  object 


48 


may  be  extremely  difficult  to  estimate  because  there  may 
exist  a  large  set  of  significantly  incorrect  solutions  which  in¬ 
duce  flow  fields  similar  to  the  correct  one.  We  found  that  if 
the  field  of  view  corresponding  to  the  region  c  -staining  the 
interpreted  flow  field  is  small,  and  the  depth  variation  and 
translation  magnitude  are  small  relative  to  the  distance  of 
the  object  from  the  camera,  then  the  determination  of  the 
3-D  motion  and  structure  can  be  expected  to  be  very  sensi¬ 
tive  to  noise  and.  in  the  presence  of  a  realistic  level  of  noise, 
practically  impossible.  We  experimentally  found  that  there 
is  also  a  relation  between  the  location  of  the  FOE  and  the 
degree  of  ambiguity. 

The  second  ambiguity  is  m  the  decomposition  of  the 
flow  field  into  sets  of  vectors  corresponding  to  indepedently 
moving  objects.  The  rigidity  assumption  [ULL79]  has  been 
found  to  be  inappropriate  for  noisy  flow  fields;  that  is,  the 
consistency  of  a  set  of  flow  vectors  with  the  same  motion 
parameters,  up  to  the  estimated  noise  level,  does  not  rea¬ 
sonably  guarantee  that  they  are  really  induced  by  one  rigid 
motion.  Two  independently  moving  objects  may  induce 
optical  flows  which  are  compatible  with  the  same  motion 
parameters  and  hence,  there  is  no  way  to  refute  the  hy¬ 
pothesis  that  these  flows  are  generated  by  one  rigid  object. 
As  an  alternative  to  the  usual  rigidity  assumption,  it  is  as¬ 
sumed  in  [ADI85a,b]  that  a  connected  set  of  flow  vectors, 
which  is  consistent  with  a  rigid  motion  of  a  planar  surface, 
is  induced  by  a  rigid  motion.  This  assumption  is  weaker 
than  the  standard  assumption  in  the  sense  that  it  can  only 
be  applied  in  more  restricted  situations  and,  therefore,  it  is 
more  likely  to  be  correct. 

The  results  of  the  ambiguity  analysis  can  be  used  when 
the  effectiveness  of  motion  algorithms  is  evaluated  for  real- 
world  tasks.  They  can  help  to  decide  which  algorithm  to 
choose,  and  in  what  situations  this  algorithm  can  be  ex¬ 
pected  to  be  effective.  Recovering  motion  and  structure 
of  independently  moving  objects  may  be  particularly  dif¬ 
ficult,  ns  was  demonstrated  by  the  flat  error  surfaces  ob¬ 
tained  for  such  objects  in  the  second  and  fifth  experiments 
in  [ADI85b].  In  general,  ambiguity  in  recovering  3-D  mo¬ 
tion  and  structure  of  independently  moving  objects  can  be 
expected,  since  the  effective  field  of  view  and  the  ratio  of 
the  depth  variation  to  the  distance  between  the  object  and 
the  camera  are  usually  small.  Even  in  ambiguous  situa¬ 
tions,  constraints  and  parameters  might  be  extracted.  In¬ 
tegration  of  such  partial  information  over  a  time  sequence 
of  flow  fields  may,  eventually,  resolve  the  ambiguity  and 
result  in  a  unique  interpretation. 

1.3.  RELIABLE  COMPUTATION  OF  OPTIC  FLOW: 

A  SMOOTHNESS  CONSTRAINT  AND  A 

CONFIDENCE  MEASURE 

Although  our  hierarchical  correlation  algorithm  [GLA83J 
for  the  computation  of  dense  displacement  fields  has  proved 
to  be  an  efficient  and  reliable  technique,  there  are  still  a 
number  of  situations  where  the  algorithm  makes  mistakes. 
These  situations  arise  in  areas  of  images  without  significant 


intensity  variations  and  at  occlusion  or  motion  boundaries. 
Our  previous  work  [ANA84J  attempted  to  identify  such  sit¬ 
uations  through  the  use  of  a  confidence  measure  which  in¬ 
dicated  the  reliability  or  a  match  vector.  Our  current  work 
attempts  to  improve  matches  with  low  confidence  based  on 
neighbouring  matches  with  higher  confidences,  by  means  of 
a  relaxation  process. 

The  confidence  measure  that  was  described  in  [ANA84] 
is  a  scalar  value  between  0  and  1  that  indicated  the  pli¬ 
ability  of  the  displacement  vector  at  a  pixel  in  the  image. 
Oqe  such  value  was  provided  for  each  pixel.  This  measure 
was  derived  by  studying  the  properties  of  the  error-surface 
obtained  during  the  process  of  computing  the  displacement 
at  a  pixel.  However,  the  image  displacement  vector  is  a 
two-dimensional  quantity.  Hence,  it  is  appropriate  to  have 
a  two-dimensional  confidence  measure  associated  with  the 
displacement  vector. 

In  our  previous  work  [ANA84],  we  observed  that  the 
error-surface  allowed  us  to  distinguien  between  situations 
where  we  had  completely  reliable  information  information 
regarding  the  displacement  vector  (i  e.,  at  high  curvature 
points  along  image  contours),  where  we  had  partial  infor¬ 
mation  (i.e.,  at  edge  locations  where  only  the  displacement 
perpendicular  to  the  edge  can  be  reliably  measured),  and 
situations  where  we  had  no  reliable  information  (at  homo¬ 
geneous  intensity  areas  of  the  image).  The  new  confidence 
measure  is  a  vector  quantity  which  uses  these  distinctions. 

Our  current  work  consists  of  two  steps.  The  first  is 
the  computation  of  these  vector  valued  confidence  measures 
and  the  second  is  the  smoothing  process  which  corrects  un¬ 
reliable  displacement  vectors  based  on  their  reliable  neigh¬ 
bours. 

1.  The  new  confidence  measure  is  best  described  as  a  two- 
dimensional  vector.  It  is  convenient  to  describe  the 
vector  in  terms  of  two  orthogonal  basis  vectors 
and  em,-a ,  which  vary  from  pixel  to  pixel  in  an  im¬ 
age.  The  displacement  vector  D  can  be  decomposed 
in  terms  of  its  components  along  these  basis  vectors 
and  confidence  measures  Cm**  and  c™,,,  are  associ¬ 
ated  with  these  components.  The  basis  vectors  and  the 
confidence  measures  can  be  easily  understood  by  their 
behaviour  at  a  high  curvature  point,  an  edge  point  and 
a  point  in  a  homogeneous  area  of  the  image. 

At  a  high-curvature  point  both  cmM  and  will 
be  high,  indicating  that  ail  the  components  of  displacement 
vector  is  highly  reliable.  In  this  case  the  exact  directions 
of  ima*  and  em„  are  not  crucial,  and  will  depend  on  the 
precise  shape  of  the  contour.  At  an  edge  point  cm#,  will  be 
high  and  cmi>  low,  and  em„  and  em,-a  will  respectively  be 
perpendicular  and  parallel  to  the  edge.  At  a  homogeneous 
area  both  the  confidences  will  be  low,  and  the  directions 
of  the  basis  vectors  will  depend  on  the  details  of  the  image 
intensity  variations  at  that  point. 


49 


Finally,  the  new  confidence  measures  are  also  based  on 
the  ihape  of  the  correlation  error  surface.  The  details  of 
their  computation  are  described  in  [ANA85i.  It  is  worth¬ 
while  to  note  that  these  are  no  longer  bound  to  be  between 
0  and  1.  The  formulation  of  the  smoothness  constraint  de¬ 
ed  ibed  below  requires  that  these  values  be  allowed  vary 
between  0  and  go  . 

2.  The  process  of  improving  unreliV  'e  match  estimates 
based  on  its  neighbours  is  formatted  as  a  smooth¬ 
ness  constraint  os  the  displace.^.  ..  vector  field.  The 
smoothness  constraint  consists  of  two  errors  E — — 
and  Etffnm  ,  whose  sum  is  minimised. 

measures  the  spatial  variation  o f  the  displace¬ 
ment  field  -  Lc.,  the  smoother  the  variation,  the  smaller  is 
the  error.  One  example  of  such  a  constraint  can  be  found 
in  the  work  of  Born  and  Schunck  [HOR81],  Pr)f  r-  mea¬ 
sures  the  deviation  of  the  smooth  displacement  field  from 
the  initial  field  provided  by  the  matching  process. 

Emffrm r  =  -  f*)  '  *»•) 

where  U  is  the  smoothed  displacement  vector  and  D  is  the 
initial  veector  at  a  pixel  provided  by  the  matching  procees. 
The  definition  of  this  error  makes  it  clear  that  the  low  con¬ 
fidence  estimates  are  allowed  to  vary  more  than  the  high 
confidence  estimates.  Bence,  the  smoothing  process  mod¬ 
ifies  the  initial  displacement  values  at  locations  low  confi¬ 
dence  measures  more  than  those  at  the  locations  of  high 
confidence  measures. 

The  smoothness  constraint  translates  into  a  minimise 
tiou  problem.  We  solve  this  problem  using  the  finite-element 
method,  because  this  method  permits  the  inclusion  of  known 
discontinuities  in  the  displacement  field.  The  application 
of  thin  method  leads  to  a  local  relaxation  algorithm,  which 
iteratively  updates  the  displacement  vector  field. 

Our  future  work  will  consist  of  developing  techniques 
for  locating  the  displacement  discontinuities,  gaining  a 
greater  understanding  of  the  confidence  measures  (in  par¬ 
ticular  bow  to  normalise  them)  and  poeaible  improvements 
to  the  smoothness  error. 

1.4.  REFINEMENT  AND  PREDICTION  OF 

IMAGE  DYNAMICS  AND  ENVIRONMENTAL 
DEPTB  MAPS  OVER  MULTIPLE  FRAMES 
To  a  large  extent  research  in  the  interpretation  of  mo¬ 
tion  has  focussed  on  the  recovery  of  the  motion  parameter 
of  a  sensor  moving  through  a  static  environment,  and  more 
generally  the  relative  motion  between  a  sensor  and  a  visible 
object.  Under  ideal  conditions,  once  these  motion  param¬ 
eters  are  known,  a  depth  map  era  be  recovered  from  two 
frames  if  the  displacement  (flow)  field  is  exact. 

In  previous  sections  of  this  review,  we  have  discussed 
various  reasons  why  displacement  fields  are  not  perfect. 
Even  with  perfect  information  about  sensor  motion,  dis¬ 


placement  vectors  from  translational  motion  are  a  function 
of  the  depth  of  the  surface  element.  Any  ambiguity  or 
error  in  displacements  along  linear  paths  e mans'  ng  radi¬ 
ally  from  the  FOE  leads  to  ambiguity  in  the  dep  ( ‘  of  that 
surface  element.  There  are  several  sources  of  sm^.  ambi¬ 
guity  including  multiple  minima  in  the  matching  process 
for  computing  displacements,  noise  affecting  the  match  lo¬ 
cation,  and  finally  the  reeoluJon  in  the  matching  process 
along  that  radial  path.  Consequently,  we  art  viewing  the 
matching  process  as  a  dynamic  refinement  of  depth  over 
multiple  frames. 

The  work  that  we  discuss  here  is  n  first  step  in  the 
exploration  of  several  Issues  involved  In  the  stability,  refine¬ 
ment,  and  prediction  oi  depth  mape  over  multi  pis  frames 
[BHAMj.  We  are  considering  the  differwacee  in  start-up 
(when  no  depth  information  exists)  versus  updating  aa  ex¬ 
isting  (and  possibly  inaccurate)  depth  map;  in  both  situa¬ 
tions  we  asenme  limited  computational  resources  are  avail¬ 
able,  yet  increasing  accuracy  over  time  is  required. 

When  an  image  sequence  in  first  acquired,  or  the  vis¬ 
ible  field  changes  dramatically  (as  in  the  case  of  coming 
around  a  comer],  no  depth  map  exists  and  the  situation 
can  be  considered  as  a  start-up.  Uudat  an  assumption  of 
a  fixed  limit  on  the  computation  that  can  be  carried  out 
between  any  pair  of  frames,  a  strategy  has  been  developed 
to  extract  a  coarse  depth  approximation  from  the  first  pair 
of  frame*  using  a  coarse  spatial  resolution  for  the 
process.  Each  subsequent  frame  that  is  process «d  can  ass 
the  previous  estimate  of  depth  to  narrow  the  match  area 
while  increasing  the  match  resolution,  thereby  miiniiiniin 
constant  computation,  but  finer  accuracy  is  the  depth  esti¬ 
mates.  As  this  process  continues,  temporal  resolution  can 
alec  be  reduced  ae  oeceesary.  Thue,  the  approach  employed 
iovohree  a  combined  hierarchical  spatial  and  temporal  res¬ 
olution  as  frames  continue  to  arrive. 

The  refinement  strategy  that  w*  have  just  described 
for  the  start-op  phase  of  depth  map  recovery  can  be  gener¬ 
alised  for  updating,  prediction,  and  error  analysis.  Under 
known  sensor  motion  and  known  environmental  depth,  the 
image  location  and  appearance  of  environmental  features 
can  be  accurately  predicted  and  matched  from  one  frame 
to  the  next  (leaving  aside  complex  issues  of  image  change* 
due  to  changes  in  lighting,  highlights,  shadows,  shape  dis¬ 
tortion  of  surface  patches,  or  occlusion).  Thue,  when  one 
reaches  tbs  desired  level  (or  limit)  of  spatial  and  temporal 
resolution,  the  updating  process  becomes  on*  oi  prediction 
and  verification  of  the  environmental  model.  When  predic¬ 
tions  are  not  accurate,  then  depending  upon  the  represen¬ 
tation,  the  depth  of  either  pixel*,  point*,  lines,  regions,  or 
surfaces  could  be  refined  in  a  focus-of- attention  and  refine¬ 
ment  procees  for  error  reduction.  Areas  of  the  image  and 
environment  that  do  not  behave  as  predicted  become  the 
focus  of  processing  until  their  image  dynamic*  over  time  can 
be  properly  predicted.  In  this  manner  one  has  aa  ongoing 
mechanism  for  verification  of  the  current  interpretation  of 
the  environment. 


IL  IMAGE  INTERPRETATION 

Work  on  the  VISIONS  system  for  interpretatioc  of 
italic  image*  continue*.  ml*  baaed  system  for  gener¬ 
ating  initial  object  hypotheses  from  image  data  baa  been 
extended  to  permit  information  from  multiple  source*  of 
low  level  data  to  be  “fused*  in  a  con* latent  manner.  On 
the  baaia  of  the  reaulta  in  a  forthcoming  ihesis  by  Wey¬ 
mouth  [WEY85J,  we  have  refined  the  notion  of  achemaa  aa 
a  repreaentation  of  knowledge.  We  are  implementing  a  new 
acbema  ayatem  in  Common  LISP  and  translating  existing 
achemaa  and  their  aaaociated  interpretation  atrategiaa  into 
the  new  format.  We  ar*  continuing  to  explore  inferencing 
mechaniama  baaed  on  the  Shafer- Dempater-Lowrance  idea 
of  evidential  reaaoning  [SHA76,  DEM68,  LOW  12,  WES83, 
WES65].  A  recent  development  ia  a  method  for  generat¬ 
ing  maaa  function*  uaing  explicit  knowledge  about  the  iso- 
age  domain  without  requiring  that  the  range  of  value*  over 
which  the  maaa  function*  are  defined  be  cither  explicitly  or 
implicitly  diacretised  into  ‘propositions’. 

0.1.  RULE-BASED  HYPOTHESES  FROM 

COMPLEX  AGGREGATIONS  OF 

IMAGE  EVENTS. 

In  a  recent  paper  jWEY83]  we  deacribed  a  simpi*  type 
of  knowledge  aonrce  for  generating  object  hypotheeee  for 
particular  region*  in  th*  image.  The  rule*  are  defined  in 
term*  of  ranges  over  a  scalar  feature,  and  complex  rule* 
are  defined  aa  combination*  of  th*  output  of  a  act  of  sim¬ 
ple  rules.  Th*  score*  of  the**  rule*  serve  aa  a  focus  of 
attention  mechanism  ter  other,  more  complex  know  ledge- 
based  processes  The  rules  can  also  be  viewed  a*  seta  of 
partially  relundant  features  each  of  which  defines  aa  area 
of  feature  apace  which  represents  a  “vote"  for  aa  object 
on  the  basis  of  this  single  feature  value.  Th*  nr  j u  at¬ 
tributes  include  color,  texture,  shape,  sise,  image  location, 
and  relative  location  to  other  object*.  More  recently,  the 
approach  has  been  extended  to  lines,  with  feature*  includ¬ 
ing  length,  orientation,  contrast,  width,  etc.  In  many  case*, 
it  i*  possible  to  define  rules  which  provide  evidence  for  and 
against  the  semantically  relevant  concept*  representing  th* 
domain  knowledge.  While  no  single  rule  i*  totally  reliable, 
the  combined  evidence  from  many  such  ruin*  should  imply 
the  correct  interpretation. 

Most  of  the  rules  previously  deacribed  ar*  unary,  ac¬ 
cepting  a  region  as  input  and  returning  a  confidence  for 
the  object  label.  In  addition,  simple  binary  rules,  defined 
over  pairs  of  regions,  were  used  to  determine  the  similar¬ 
ity  of  the  regions  and  to  form  aggregations  of  regions  with 
similar  properties.  Typically,  the  rules  operate  on  primi¬ 
tive*  formed  by  a  single  segmentation  process  (eg.  regions 
or  lijes)  and  result  in  the  merging  of  the  primitives  into 
a  more  complete  description,  depending  on  tbs  confidence 
returned  by  th*  rule*  Forming  more  abstract  group*  of 
elements  in  this  way  has  advantages  when  dealing  with  un¬ 
reliable  segmentation  processes:  fragmented  elements  can 
be  grouped  to  form  aggregate*  which  perhaps  more  closely 


match  object  model*. 

Recently,  we  have  extended  this  approach  to  include 
relational  rule*,  which  capture  expected  relations  between 
the  element*  of  multiple  represer.tai.-oa  (eg.  region*,  lines, 
surface*)  of  tbs  image  data  [BELfifij.  Using  rules  of  thk 
form,  sets  of  elements  aero**  th*  multiple  representation* 
can  be  selected  and  grouped  on  the  basis  of  relational  scalar 
measure*  associated  with  each  rule.  Th*  result,  yfumifii 
the  confidence  train*  returned  by  th*  rule  i*  high  enough,  m 
th*  construction  of  complex  aggregations  of  element*  which 
satisfy  user -specified  relations  across  the  multi  pi*  represen¬ 
tation*.  On*  ad  van  tare  at  this  approach  i*  that  it  m  mod¬ 
ular  and  extensible;  when  new  representation*  at*  added 
to  th*  system,  integration  is  accomplished  by  adding  the 
appropriate  rules. 

In  oar  preliminary  work,  we  are  concerned  with  rela¬ 
tional  rules  defined  over  regions  tad  lines.  Sine*  both  an 
defined  ia  a  pixel- based  representation,  a  convenient  ba¬ 
sis  for  th*  rules  is  intersection  of  tbs  comspoding  rets  of 
pixel*.  Such  relational  rules,  called  intersection  rules,  are 
composed  of  three  components: 

1)  a  fitUri ng  rule  for  selecting  lines  which  intersect  a  re¬ 
gion  based  on  relational  measures; 

2)  a  rsaJkraf  rule  which  ranks  th*  lines  which  intersect  a 
region  based  on  line  attributes;  and 

3)  a  tewheifw*  /section  which  calculate*  the  final  tcors 
of  tb*  region-lias  aggregation  based  on  th*  scores  from 
tbs  fiiUmf  rule  and  th*  rsnhtnf  r*I*. 

Th*  relational  measures  ar*  used  to  measure  the  type 
and  degree  of  the  relationship  between  a  region  and  a  line. 
Lines  associated  with  regions  are  categorised  into  three 
type*:  boundary  lines,  interior  lines,  ard  line*  which  art 
neither  interior  nor  boundary.  Th*  measures  are: 

1.  interior-lias- percentage:  the  ratio  of  line  area  interior 
to  th*  region  to  total  line  area. 

2.  region- perimeter- percentage:  the  ratio  of  region  bound¬ 
ary  pixels  covered  by  the  line  area  to  th*  region  perime¬ 
ter. 

3.  line-length-percentage:  tb*  ratio  of  th*  length  of  the 
region  boundary  covered  by  the  line  area  to  the  total 
length  of  tb*  line. 

The  filtering  rule  is  then  a  complex  line  rule  composed 
of  a  simple  rule  for  each  relational  measnre;  ia  many  cases 
it  simply  removes  certain  combinations  of  regions  and  lines 
from  further  consideration.  Th*  ranking  rule  ranks  each 
line  on  the  basis  of  bow  well  it  satisfies  the  associated  rela¬ 
tional  measure.  The  combination  rule  is  supplied  the  scores 
from  tbe  filtering  mie,  the  ranking  rule,  and  the  relational 
measure*  and  converts  these  into  a  confidence  for  (he  hy¬ 
pothesis  supported  by  th*  rule. 

These  intersection  rvtes  can  be  used  in  soms  very  di¬ 
verse  ways.  One  example  ia  to  use  a  filtering  rule  on  interior 
line- percentage  to  select  only  those  lines  which  are  interior 
to  a  region.  The  ranking  rule  could  then  be  defined  to 


select  short,  high -contrast  line*.  The  score  of  the  ranking 
rule  could  then  be  averaged  to  form  a  complex  texture  mev 
sure.  Alternatively,  a  density  measure  could  be  calculated 
by  counting  the  occurrence*  of  line*  which  receive  a  high 
•core  from  the  ranking  rule  and  then  normal ising  by  the 
liae  of  the  region. 

A*  an  additional  example,  tl  -  .;mgth -percentage 

measure  could  be  uaed  to  select  hum  which  lie  mostly  os 
the  boundary  of  the  region.  The  ranking  rule  could  then  be 
defined  to  favor  long  lines.  The  scores  from  the  ranking  rule 
could  then  be  averaged  using  region-perimeter-  percentage 
as  a  weighting  factor  to  form  a  simple  shape  measure. 

A  preliminary  implementation  of  the  extended  rule 
system  has  been  completed,  several  simple  texture  and  shape 
rules  have  boss  written,  and  results  have  bees  obtained 
on  urban  house  scenes  and  os  road  scenes.  The  results 
[BEL8S]  are  quite  promising.  For  example,  we  have  been 
able  to  find  roads  in  serera)  roadscenes  by  using  a  rule 
which  implements  a  simple  shape  measure.  In  tbs  tuiius, 
we  intend  to  write  additional  rules  and  apply  the  system 
ha  a  larger  variety  of  images,  develop  new  rule  types,  add 
additional  representations  for  st  ju o**,  depth,  and  surface 
segmentations,  and  incorpor*'  .’-breed  system  into 

the  schema  system  currently  .  {see  next  sec¬ 

tion). 

D.2.  SCHEMA  NFTWOPi. 

REPRESENTATION  -  l\,i. IX. -3 

In  the  VISIONS  sysf  ne  independent  knowledge 
is  represented  in  a  hierar  schema  structure  organised 
as  a  semantic  network  |HAN .  S,  WEY83,  PARSO,  HAN83). 
The  hierarchy  is  structured  to  capture  the  decompoeition  of 
visual  knowledge  into  successively  more  primitive  entitiee, 
eventually  expressed  in  symbolic  terms  similar  to  those  used 
to  represent  the  intermediate  level  description  of  a  specific 
image  obtained  from  the  region,  lice,  and  surface  segmen¬ 
tations.  Each  schema  defines  a  highly  structured  collection 
of  elements  in  a  scene  or  object;  each  object  in  the  scene 
schema,  or  part  in  the  object  schema,  can  have  an  associ¬ 
ated  schema  which  will  further  describe  it.  Each  schema 
node  has  both  a  declarative  component  appropriate  to  the 
level  of  detail,  describing  tbs  relations  between  the  parts  of 
the  schema,  and  a  procedural  component  describing  :raag « 
recognition  methods  as  a  set  of  hypothesis  and  verification 
strategies  called  interpretation  ttralegut. 

The  schema  system  provides  a  hierarchy  of  memory 
structures,  from  vertices  (or  even  pixels)  at  the  bottom 
level  through  semantic  objects  at  the  top.  A  further  di¬ 
vision  of  knowledge  into  long  term  (LTMj  and  short  term 
memory  (STM)  acrces  the  levels  of  hierarchy  provides  a 
convenient  way  of  differentiating  the  system's  permanent  s 
pnon  knowledge  base  from  the  knowledge  that  it  has  re¬ 
ceived  or  derived  from  a  specific  image.  The  goal  of  the 
system  is  an  interpretation,  by  which  is  me  .cat  a  collection 
of  objects  at  the  top  level  of  STM  that  is  consistent  with 
otb  the  image  date  and  the  system's  «  poors  knowledge 


of  the  world  as  represented  in  LTM. 

A  central  problem  of  high-level  vision  is  bow  to  make 
use  of  knowledge,  not  just  to  caiegoriss  tbs  results  of  lower 
levels  of  computation  but  also  to  guide  those  levels  through 
the  space  of  imags  analysis  and  feature  extraction  tech¬ 
niques.  Practical  systems  will  used  to  know  about  a  ex¬ 
tremely  Urge  number  of  objects  -  a  prohibitive  number  for 
any  system  that  attempts  to  find  each  object  la  each  image. 
Furtheimore,  there  is  a  computationally  explosive  number 
of  low  and  mid-level  image  operations  (eegmsetitirw  al¬ 
gorithms,  texture  measures,  lias  finders,  rectangle  finders, 
line  grouping  operators,  etc.  which  collectively  are  termed 
’knowledge  sources’)  which  might  be  applicable,  especially 
when  ooe  realises  that  for  almost  every  object  then  might 
be  a  variation  of  certain  operations  that  would  be  particu- 
lariy  well  suited  to  recognising  just  that  object  As  a  result, 
the  combinatorics  of  what  low-  and  mid- level  procemss  to 
apply  and  how  to  interpret  their  results  is  simply  too  grua’. 
to  expect  any  near-term  increase  in  the  power  of  computing 
systems  to  solve  the  problem  by  brute  force  computation. 
The  high  level  vision  system  must  heurietlcally  control  the 
work  being  done  at  the  lower  level*  far  computer  vision  to 
ever  be  computationally  feasible.  The  goal  of  this  research, 
then,  is  to  provide  a  prototype  knowledge  driven  system 
called  the  Schema  System,  to  interpret  images  and  provide 
control. 

7  be  development  of  tbs  schema  system  confronts  many 
of  the  same  ieeuee  that  have  com*  ep  in  other  interpretation 
and  control  domains,  each  as  speech  understanding  [LES7S, 
WOOTB|.  Among  them  are  questions  of  the  knowledge  rep- 
resentation,  tbs  communication  of  Information,  error  recov¬ 
ery  and  the  selection  of  knowlsdgu  sources. 

A  basic  idea  embedded  in  the  schema  system  is  that 
the  knowledge  of  how  to  recognise  an  object  is  embedded  In 
a  scheme  at  that  object.  U  particular,  the  system  has  a  set 
of  scheme  /rwmes  which  are  procedures  that  compriae  all 
of  the  system's  knowledge  that  Is  unique  to  that  object,  in¬ 
cluding  such  piece*  of  information  as  what  wtsu  trams  to 
start  up  to  recognias  a  subpart  and  what  knowledge  sources 
would  be  particularly  effective  to  recognise  the  object.  In 
order  to  recoguiae  an  particular  instance  of  an  object  in  an 
image.  »)■«  schema  frame  is  sthsehf  to  make  a  scheme  ns- 
rte.net,  which  is  t  copy  of  the  schema  that  is  activated  with 
a  default  set  of  parameters.  The  schema  instance  may, 
is  turn,  activate  another  schema  frame  to  make  another 
schema  instance,  and  so  on  down  the  line.  During  the 
interpretation  process,  tb*  set  of  active  schema  instances 
run  concurrently  and  exchange  information  by  writing  and 
reading  to/from  a  blackboard  in  order  to  create  a  single 
interpretation.  The  interpolation  is  a  semantic  network 
composed  of  different  types  of  hypothesis  oodae  (on*  for 
each  level  of  memory)  snd  three  kinds  of  links:  links  be¬ 
tween  hypotheses  at  on*  level,  realisation  links  (between 
levels),  and  hypothesis  links  (between  STM  and  LTM). 

iDter-schema  communication  is  accomplished  via  a 
blackboard.  In  general,  there  are  three  types  of  messages 


that  go  on  the  echema  blackboard:  Hypotheses,  Goals  and 
Personal  MaiL  An  important  issue  is  when  is  informa- 
tion  propagated,  i.e.  at  what  point  does  one  schema  in¬ 
stance’s  hypothesis  effect  another.  We  have  adopted  the 
basic  principle  that  the  decuio »  whether  information  thonld 
be  propagated  from  one  tchema  to  another  or  not  retide*  in 
the  reader  (given  the  blackboard  communication),  not  the 
writer.  A  schema  instance  must  make  sore  the  hypothe¬ 
sis  has  been  posted  by  the  time  it  is  strong  enough  that 
another  schema  might  use  it. 

Any  system  which  manipulates  uncertain  information 
must  confront  the  problem  that  its  hypotheses  will  some¬ 
times  prove  wrong  and  must  therefore  include  mechanisms 
for  error  recovery.  This  is  particularly  a  problem  for  black¬ 
board  style  systems,  where  the  failed  hypothesis  may  have 
affected  an  unknown  number  of  other  decisions.  One  of  the 
main  features  of  this  schema  based  approach  is  that  all  of 
the  information  about  an  object  instance  resides  in  one  ob¬ 
ject  hypothesis  and  the  schema  instance  that  created  it.  In 
particular,  all  of  the  dependency  information  is  already  in 
that  schema  instance,  along  with  the  partial  results  used  to 
calculate  any  decisions  made  as  a  result  of  a  dependancy. 
The  schema  system  therefore  assigns  to  the  schema  that 
formed  the  hypothesis  the  duty  of  maintaining  it.  When 
the  schema  has  finished  everything  else,  it  simply  goes  to 
sleep,  waiting  for  its  context  to  change.  If  the  context  does 
change,  it  wakes  up,  alters  its  hypothesis  accordingly,  and 
goes  back  to  sleep.  The  result  is  that  the  schema  system  has 
true  error  recovery  capability,  avoids  manipulating  depen¬ 
dency  lists,  and  can  use  previous  partial  results  to  calculate 
changes. 

One  obvious  implication  of  ths  schema  system  as  de¬ 
scribed  is  that  it  has  no  global  monitor.  Not  only  do 
schemas  control  knowledge  sources,  they  control  themerlvce 
in  a  distributed  manner.  There  ie  no  equivalent  to  the 
Hearsay  Focus-of-control-databaee  It  scheduler  |LES7S|  to 
decide  what  the  system  should  do  next.  The  advantage  to 
thie  ie  that  control  in  the  echema  system  is  mors  flexible. 
For  example,  it  ie  easier  to  give  a  single  object  a  unique 
relationship  to  its  subparte  than  it  would  be  in  a  central¬ 
ised  system,  where  the  monitor  might  have  to  be  altered  is 
non  trivial  ways.  In  fact,  each  schema  instai.ee  may  control 
its  own  resources  in  a  different  way.  Thie  means  that  more 
specialised  knowledge  can  be  incorporated  into  the  control 
decisions,  as  opposed  to  being  forced  into  extremely  gen¬ 
eral  methods.  It  also  facilitates  experimenting  with  control 
methods. 

The  first  schema  prototype  being  developed  is  called 
the  OHM,  or  Object  Hypothesis  Maintenance  schema.  The 
mm»  is  to  emphasise  that  the  OHM  does  not  simply  cre¬ 
ate  a  hypothesis,  it  also  maintains  it  as  ths  interpretation 
process  proceeds.  Internally,  the  OHM  is  a  collection  of 
seven  interpretation  strategies  (IS's),  each  of  which  runs  as 
its  own  concurrent  process.  The  most  important  IS  is  tbs 
OHM-control  process.  This  strategy  is  responsible  for  de¬ 
ciding  how  the  OHM  and  ;ts  hypothesis  relates  to  the  rest 


of  the  system.  The  remaining  six  IS’s  are  Initial  Hypotheses 
(typically  inexpensive  processes  that  give  a  first  estimate  as 
to  whether  the  object  exists  in  the  image,  and  if  so  where), 
Hypothesis  Expansions  (e.g.  an  algorithm  that  expands  a 
roof  hypothesis,  given  just  a  corner  of  the  roof),  hypoth¬ 
esis  support,  conflict  resolution,  negative  information  (in 
general,  how  to  use  the  information  that  something  i*nt  a 
particular  object),  and  information  from  subparts  and/or 
superparts. 

A  programming  shell  has  been  created  for  research  im¬ 
plementation  of  f  hems  sets.  Schema  sets  are  groups  of 
concurrent  processes  whose  goal  is  to  label  a  given  type 
object,  operating  from  high-level  contextual  and  relation 
knowledge,  and  intermediate  feature  knowledge.  The  ob¬ 
ject  labeling  is  implemented  proced orally,  which  permits 
strategies  to  be  tailored  to  the  object  being  labelled  with 
little  interference  from  globally  impoeed  data  structuring. 
At  the  same  time,  the  lack  of  a  global  controller  imposes  a 
great  deal  of  structure  on  interprocess  communication. 

The  purpose  of  the  shell  is  tc  encourage  development 
and  testing  of  labeling  strategies  by  optimising  research 
and  programming  and  testing  time.  A  prototype  Adi  is  in 
place  and  S  object  schema  types  art  at  different  stages  of 
development  and  testing  under  the  current  shell.  Feedback 
from  these  preliminary  schemas  will  lead  to  improvements 
in  the  shell  structure  itself.  Ths  implementation  is  in  Com¬ 
mon  LISP  os  tbs  TI  Explorer*,  with  low  level  data  and 
image  processing  functions  haadlsd  os  VAX. 

DJ.  INFERENCE  NET 

Ws  are  actively  ex. -faring  the  mathematical  founda¬ 
tions  of  a  knowledge  representation  framework  within  the 
domcin  of  vision  using  ths  theory  of  evidential  reasoning  as 
developed  by  Dempster  |DEMM|  and  Shafer  [SBA76). 

The  Dempeter-Shaier  formalism  for  evidential  reason¬ 
ing  supports  aa  explicit  representation  of  partial  ignorance, 
uncertainty  and  conflict.  The  ioferencing  model  allows  ‘be¬ 
lief*  or  “confidence*  in  a  proposition  to  be  represented  a a  a 
range  within  the  |0,I|  interval.  The  lower  and  upper  bounds 
represent  support  and  plausibility,  respectively,  of  a  propo¬ 
sition,  while  the  width  of  the  interval  can  be  interpreted  as 
ignorance. 

The  repn.ee  at  at  iogj^as  two  components  [REY85].  The 
first  part  is  static,  and  explicitly  associates  measure  able 
properties  of  some  feature  of  the  image  data,  via  knowledge 
sources,  to  labels  which  a re  to  be  assigned  to  abstractions  of 
the  image  data.  This  association  is  mads  using  the  notion 
of  a  mass-function  ar  defined  by  Shafer.  These  mass  fane- 
tir.j*  are  generated  using  ths  notion  01  a  possibility  function 
which  is  defined  using  explicit  knowledge  about  the  image 
domain  in  question.  Previous  methods  required  that  the 
range  of  values  over  wbicn  the  mass  functions  are  defined 
be  either  explicitly  or  Implicitly  disc  retired  into  “feature 
pro  petitions* . 

The  second  part  uses  this  static  representation,  a  frame 


of  discernment,  and  the  theory  of  evidence  u  developed 
by  Shafer  and  by  Lowrance  to  combine  the  man  function* 
(via  Dempster*  rule)  and  arrive  at  a  consensu*  opinion  for 
the  purpose  of  aetennining  the  correct  label  of  the  image 
abstraction.  Assumptions  about  the  image  domain  are  rep¬ 
resented  within  the  knowledge  network  via  possibility  func¬ 
tions;  a  conflict  value  detects  when  an  assumption  has  been 
violated  and  is  used  a*  represention  of  uncertainty  within 
the  system.  ° 

Our  representation  provides  a  simple  mechanism  for 
representing  uncertain  information  and  for  pooling  of  par¬ 
tial  evidence.  Assumptions  one  makes  about  the  domain 
provide  the  constraints  on  the  relationship  between  primi¬ 
tives  extracted  from  the  image  data  and  objects  in  the  scene 
one  is  trying  to  reason  about;  we  are  interested  in  obtaining 
and  pooling  evidence  which  pertains  to  these  constraints. 
These  include  intrinsic  properties  of  the  objects,  which  are 
expressed  as  unary  constraints,  and  contextual  constraints 
such  as  spatial  relationships  which  are  binary  or  in  general 
n-ary  relations  (for  example  adjacency  is  a  binary  relation, 
betweenness  is  a  ternary  relation). 

m.  INTERMEDIATE  LEVEL  VISION 

The  general  strategy  by  which  the  VISIONS  system 
operates  is  to  build  an  in,  rrmsdiat*  symbolic  represent*- 
tion  of  the  image  data  using  processes  which  initially  do 
rot  make  use  of  any  knowledge  of  specific  objects  in  the 
domain.  Tbs  result  is  a  representation  of  the  image  in 
terms  of  intermediate  primitive  such  as  regions,  lines,  and 
local  surface  patches  eith  associated  feature  descriptor*. 
These  primitives  may  be  directly  associated  with  an  object 
label  (using  the  rule-bieed  object  hypothesis  system  as  de¬ 
scribed  in  the  previous  section)  or  they  may  be  grouped  into 
more  abstract  descriptions.  The  grouping  processes  may 
be  guided  by  high  level  contextual  constraints  (e.g.  top- 
down)  which  effectively  select  certain  groupings  related  to 
the  interpreter  ion  goals,  they  may  be  guided  by  very  gen¬ 
eral  object-independent  constraints  .'eg.  bottom-up),  or 
they  may  be  guided  by  both,  changing  their  form  depend¬ 
ing  on  the  constraint!  available. 

In  this  section  we  summ-rise  three  areas  of  research 
whose  focus  is  the  construction  of  intermediate  level  prim¬ 
itives  and  their  features. 

m.l.  CEOMETRJC  GROUPING  OF 

STRAIGHT  LINES 

The  extraction  of  lines  based  on  significant  intensity 
changes  and  perceived  boundaries  between  areas  is  a  diffi¬ 
cult  and  important  step  in  image  understanding.  We  have 
developed  a  jew  approach  to  the  extraction  of  straight  lines 
based  on  geometric  grouping.  Tbe  primary  goal  is  tbe  ex¬ 
traction  of  straight  line*  from  image*  in  which  there  are 
fragmented  intensity  di*c-inlinuitie*.  The  secondary  goal 
is  the  demonstration  that  the  uae  of  geometric  organisa¬ 
tion  is  an  important  part  of  the  Line  extraction  process  and 
therefore  can  produce  improvement*  when  combined  with 


standard  edge  detection  technique*. 

The  algorithm  has  two  major  components:  edge  detec¬ 
tion  and  hierarchical  grouping.  Hierarchical  grouping  has 
two  steps  which  are  performed  at  each  level:  linking  and 
merging. 

There  are  many  edge  detection  algorithms  which  might 
be  used.  The  main  requirements  are  that  it  produce  mea¬ 
surements  of  the  intensity  contrast  and  direction  of  the 
edge.  The  two  algorithms  which  ws  havs  used  for  selecting 
points  are  sero  crossing*  of  the  Laplacian  operator  [MARfiO, 
CAN83J  and  the  Haralick  operator  [0A3M]. 

The  hierarch/  is  bated  on  -Kale  bnt  there  is  no  smooth¬ 
ing.  Ths  hierarchical  representation  has  a  number  of  ad¬ 
vantages.  It  is  a  compact  representation  which  redaces  the 
search  space  at  each  level  for  sequences  of  linked  edges.  It 
reflects  the  observation  that  "closeness*  of  lines  is  scale  de¬ 
pendent  and  is  s  multi-scale  representation  of  a  lint  which 
may  be  straight  only  at  large  ecaies. 

Tbe  linking  process  is  based  on  intrinsic  and  geometric 
properties.  It  searches  a  space  of  lines  for  elir-’-st  colline^r 
pairs  which  are  close  to  each  other  and  link*  the  appropriate 
endpoints.  There  are  four  criteria  need  for  linking: 

1 .  Similar  gradient  magnitude.  The  gradient  magnitudes 
across  tbs  edge  must  be  doss  to  each  other  and  in  tbe 

same  direcijoa. 

2.  The  lines  m.-t  be  approximately  collinsar.  Lines  180 
degrees  apart  are  not  linked. 

3.  Tbe  end  points  of  two  candidate  lines  most  be  «-t"tt 

4.  The  lines  must  not  ovtrLp.  If  both  endpoints  of  on# 
Ins  project  within  corresponding  endpoints  of  ths  other, 
they  are  not  linked. 

The  merging  process  consist*  of  grouping  and  replace¬ 
ment.  If  a  sequence  of  linked  lines  can  be  approximated 
sufficiency  well  by  a  straight  line,  then  they  are  grouped 
and  are  replaced  by  a  straight  line. 

This  approach  has  s  number  of  id  vantages  for  extract¬ 
ing  Knight  lines: 

1.  It  links  line  segments  even  when  they  are  separated  by 
**P» 

2.  Since  it  is  based  on  gradient  information  and  spatial 
information  it  can  find  Lines  which  havs  low  contract 
u  well  a*  high. 

3.  Sine*  it  favors  collinear  line  segment*  and  use*  gradi¬ 
ent  information,  it  i*  lea*  sensitive  to  texture  when  ex¬ 
tracting  boundary  line*.  Zero  crossing*  and  algorithm* 
which  track  edges  are  not  able  to  distinguish  between 
edge*  which  are  part  of  long,  straight  boundaries  and 
l.ne*  of  texture  because  they  do  not  use  geometric  con¬ 
text. 

The  result*  shown  in  |WE18S|  indicate  that  the  princi¬ 
ple  of  geometric  group, ng  for  extracting  long  straight  lines 
give*  significant  improvement  in  the  resuits  obtained  from 
standard  edge  detection  algorithm*.  We  plan  to  dernon- 


V 


h 


► 


'J 


C* 

> 

* 


jr 


r 

k. 


r 

U 


> 

V 


v%'.— «.ru»nu.-*«r«jr*.»  njumsjir*. 


A 


strate  thia  over  a  wider  range  of  image*.  This  implementa¬ 
tion  i*  also  a  demonstration  of  the  importance  of  g».  metric 
grouping  in  general;  rt  plan  to  extend  it  to  curved  line*, 
(see  Section  Q1.3)  paralle'  'ines,  closed  contour*,  and  other 
geometric  abstraction*. 

Although  the  algorithm  is  very  robust  in  its  extraction 
of  straight  lines,  it  has  some  problems  which  we  are  contin¬ 
uing  to  investigate.  The  ability  of  the  algorithm  to  bridge 
gap*  is  simultaneously  one  of  its  strength*  and  one  of  it* 
weaknesses.  Gap*  sometimes  appear  in  a  line  because  of 
changes  in  th*  lighting  conditions  along  th*  line,  (such  as 
shadow*  and  specular  reflections)  which  in  torn  affects  the 
magnitude  of  the  gradient.  These  gaps  should  be  bridged. 
Other  apparent  gaps  are  caused  by  the  alignment  of  dis¬ 
tinct  lines  (such  as  those  on  the  top  or  bottom  of  a  pair 
of  shutters);  such  gaps  art  real  and  should  not  be  bridged 
yet  at  some  level  in  the  hierarchical  representation  they  ap¬ 
pear  as  one  line.  Methods  must  be  found  for  analysing  th* 
multi-scale  representation  and  for  determining  what  scales 
are  appropriate  and  which  are  not  appropriate. 

The  algorithm,  like  many  others,  relies  on  intensity 
gradient  information  to  Unk  lines,  yet  what  we  perceive  as 
straight  lines  are  not  always  collections  of  edges  with  sim¬ 
ilar  intensity  gradients,  finally,  th*  algorithm  will  often 
find  long  linee  in  heavily  textured  areas  because  of  acciden¬ 
tal  alignment  of  texture  edges.  We  are  investigating  th* 
possibility  of  using  texture  measures  to  inhibit  th*  linking 
step. 

DU.  EXTRACTION  OF  CURVED  ONES 

Until  recently  the  traditional  method  in  computer  vi¬ 
sion  for  extacting  straight  and  curved  line*  has  been  ei¬ 
ther  through  the  use  of  tbs  Bough  transform  or  via  'edge 
linking*  algorithms  applied  to  tbs  output  of  some  Signif¬ 
icant  edge  pixel*  algorithm.  However,  a  novel  approach 
for  extacting  straight  lines  was  recently  reported  in  Burns, 
Hanson  and  Riseman  |BUR*4|,  and  involved  a  simple  local 
computation  (not  involving  any  histogram  methods)  fol¬ 
lowed  by  a  computation  of  connected  components. 

The  central  module  of  this  algorithm  was  a  grouping 
process  using  overlapping  partition*  on  gradient  orienta¬ 
tion.  In  the  context  of  extracting  straight  lines  this  process 
can  be  summarised  as  follows: 

•  Apply  a  gradient  orientation  ueacure  to  the  image, 

■  Partition  the  output  of  thia  measure  into  overlapping 
sectors  (normally  16  are  used)  and  label  the  image 
according  to  the  sector  into  which  th*  gradient  orien¬ 
tation  falls, 

•  Apply  a  connected  component*  algorithm  to  each  non- 
overlapping  partitioning, 

•  Apply  a  selection  procedure  to  determins  locally  which 
partition  is  preferred  (the  edge  support*, 

e  fit  a  straight  Lint  to  th*  resulting  ‘edge  support*  re- 
giant. 


We  have  been  investigating  the  application  of  this  gen¬ 
eral  approach  to  the  problem  of  extracting  semi-circular 
arcs,  replacing  gradient  orientation  with  a  curvature  mea¬ 
sure.  Specifically  we  find  ‘curve  support  regions*  which  art 
uniform  with  respect  to  a  curvature  measure  and  as  such 
can  be  abstracted  from  the  image  data  as  a  part  of  a  circle. 

The  curvature  measure  we  are  using  is  given  by  the 
Kitchen- Roeenfeld  curvature  operator  (KTT80|  defined  by: 

K  =  I„  I*  +  !m  I*  -  2/„  /.  I^i1.  +  ^)*/1. 

In  fact  this  measure  only  wakes  sens*  when  applied 
to  areas  of  locally  gradient  magnitude,  is.  aero 

crossings  of  toms  second  derivative  operator.  Thus  oar 
algorithm  can  be  ermmarised  as  follows: 

•  Apply  the  curvature  measure  along  th*  aero-croering 
contour. 

•  Partition  th*  rang*  of  the  curvator*  measure  into  over¬ 
lapping  sectors. 

•  for  each  partition,  label  each  pixel  according  to  the 
partition  into  which  th*  curvature  value  fall*. 

•  Produce  regions  by  applying  a  connected  components 
algorithm  to  the  labels  of  th*  curvature  partitions. 

s  Fit  a  acmi-circle  to  each  curve  support  region. 

e  Each  pixel  then  vote*  for  th*  region  whose  extracted 
curve  is  longest.  Th*  percentage  of  pixels  within  a 
region  that  vote  for  that  region  is  the  support  of  the 
region. 

•  Normally  the  regions  selected  are  those  whose  support 
is  greater  than  SO  percent. 

Associated  with  each  curve  is  a  set  of  curve  attributes, 
such  as  length,  center,  radios,  endpoint  parameters,  con¬ 
trast  ud  support.  Th*  algorithm  is  local  is  nature  and  is 
robust  in  th*  face  of  moderate  amount*  of  noiss  do*  to  the 
coarse  partitioning  of  the  output  curvature  measure  . 

In  summary,  ws  hav*  developed  a  system  to  derive 
local  and  piecewise  circular  descriptors  of  th*  image  data. 
The  approach  utilises  local  2D  operation*  and  is  computable 
in  parallel.  Curves  are  partitioned  based  on  constancy  of 
curvature  rather  than  usual  extrema  methods.  Also  in  con¬ 
trast  to  other  approaches,  descriptors  of  neighboring  seg¬ 
ment*  are  treated  as  independent,  with  th*  expectation  that 
higher  level  processes  will  guid*  the  next  level  of  grouping. 
The  system  is  designed  to  provide  reliable  local  primitives 
(a*  opposed  to  pixel  level  event*)  for  th*  purpose  of  moving 
up  the  abstraction  hierarchy  within  th*  image  understand¬ 
ing  system. 


C 


/•rv 


w 


•  •  v’t’*  w  %r».  v»  v>  jp»  k 


nu  APPLICATION  OF  VANISHING  POINTS  TO 

3-D  MEASUREMENT 

Perspective  i*  an  important  cut  to  3d  spaiiai  infor¬ 
mation  such  at  the  direction  of  linen  or  the  orientation  of 
t urfacen.  Human  being*  can  perceive  three-dimensional  ob¬ 
jects  in  space  even  when  locking  at  two-dimensional  images. 
A  computer  vision  system  must  do  likewise,  but  3D  shape, 
size  and  location  cannot  be  recovered  from  a  single  image 
without  additional  information.  Vanishing  points  and  van¬ 
ishing  linss  can  provide  this  information  in  tbs  case  of  ob¬ 
jects  which  have  parallel  lines  or  edges  on  planar  surfaces. 
Ones  tbs  location  of  tbs  vanianing  point  is  detected,  we  can 
use  it  as  a  cue  to  calculate  the  distance  and  shape  of  the 
object  to  which  the  parallel  lines  belong  [NAKSO,  NAKSda, 
BAR82J. 

Estimation  of  the  errors  in  these  features  has  practi¬ 
cal  significance  and  could  be  used  in  many  ways.  With 
some  object  models  such  as  buildings,  the  dihedral  eagles 
between  their  surfaces  (#4.  walls  and  roofs)  are  invariant 
shape  features.  An  estimate  for  the  ret  live  orientation  be¬ 
tween  the  surfaces  which  incorporates  the  error  allows  one 
to  verify  hypotheses  about  image  which  correspond  to  pla¬ 
nar  surfaces.  For  example  when  analysing  a  house  scene,  if 
two  adjacent  regions  art  temporarily  labeled  as  house  walk 
based  on  some  property,  say  rectangular  shape,  the  calcu¬ 
lation  of  the  mutual  angle  of  the  two  surfaces  can  be  used 
to  verify  thk.  If  the  angle  k  calculated  to  be  00  degrees,  it 
will  support  the  labelling  ae  walk.  However,  we  also  want 
to  know  when  we  should  reject  the  hypothesis.  Thk  meaa* 
that  w*  must  know  whether  the  measured  angle  k  outside 
the  estimated  range  of  error.  A  modular  process  such  as  a 
knowledge  source  on  perspective  could  sex  thk  type  of  in¬ 
formation  in  the  form  of  constraints  to  generate  and  verify 
hypotheses.  In  a  botto*v-up  approach,  the  range  of  error 
could  be  used  to  limit  the  search  space  when  looking  for 
planes  which  are  perpendicular. 

Parallel  lines  in  3d  space  are  projected  onto  the  image 
plane  to  lines  which  radiate  from  a  single  common  point, 
called  a  vanishing  point  (VP).  I*  can  be  used  to  calculate 
tbs  sise  and  orientation  of  objects  with  parallel  1-nas.  The 
vanishing  line  for  a  surface  can  be  computed  as  the  line 
passing  through  two  VP’s  obtained  from  two  seta  of  parallel 
Lines.  Thera  is  an  infinitely  many  sets  of  parallel  lines  which 
could  be  drawn  in  a  given  plane  and  tbo  vanishing  point  for 
each  set  lies  on  this  vanishing  line. 

The  surface  orientation  of  a  plane  is  given  by  the  unit 
normal  vector  perpendicular  to  the  surface.  The  vanishing 
line  (VL)  of  a  plane  gives  a  precise  description  of  the  unit 
normal.  Tht  distance  from  the  VL  to  the  center  of  the  image 
plane  corresponds  to  the  angle  of  tilt  of  the  surface  away 
from  the  viever.  If  the  line  goes  through  the  ceoter,  then 
the  normal  to  the  surface  is  parallel  to  the  viewing  plane. 
The  second  angle  of  the  surface  is  given  by  the  orientation  of 


b 


the  vanishing  line;  its  normal  is  the  projection  of  the  normal 
to  the  surface  onto  the  image  plane.  Thus,  we  analyse  how 
the  errors  in  the  distance  and  orientation  of  the  vanishing 
line  affect  the  estimates  for  the  surface  orientation. 


We  have  done  an  analysis  of  how  errors  in  the  loca¬ 
tion  of  these  vanishing  points  affects  our  estimate  of  surface 
orientation  and  line  length  (NAK84b).  We  have  developed 
formulae  for  these  errors  as  a  function  of  VP  or  VL  errors, 
and  we  have  used  constraints  based  00  real  world  knowl¬ 
edge  to  increase  the  precision  of  tne  estimates  of  surface 
orientation. 

The  assumptions  made  are  that  the  focal  length  of  the 
camera  and  the  depth  of  one  point  on  a  line  are  known.  If 
the  depth  k  not  known  at  all,  then  the  orientation  can  still 
be  recovered,  but  only  relative  distance*  can  b*  estimated. 

The  algorithm  for  locating  vanishing  points  k  a  form 
of  Hough  transform: 

1.  Extract  lines  which  are  likely  to  be  parallel 

2.  Project  those  lines  stereographically  onto  half  of  the 
Gaussian  sphere  and  extend  them  to  semicircles 

3.  Locate  peaks  by  thresholding 

The  estimation  of  the  surface  normal  from  the  esti¬ 
mates  for  two  vanishing  points  involves  intersecting  con¬ 
strained  regions  00  the  Gaussian  sphere: 

1.  Each  vanishing  point  estimate,  which  is  an  area  on  the 
Gaussian  sphere,  determines  an  annular  set  of  possible 
directions  for  the  normal  10  the  surface. 

2.  The  intersection  of  these  annular  seta  k  the  estimate 
for  the  normal. 

For  the  application  of  geometric  constraints  in  ths  case 
where  two  house  walk  are  perpendicular,  the  estimate  for 
the  rormal  for  one  wall  was  rotated  00  degrees  on  the  Gaus¬ 
sian  sphere  and  intersected  with  the  estimate  for  the  normal 
to  the  other  walL  In  our  experiments,  there  was  significant 
reduction  in  the  sise  of  the  region  estimate  for  the  surface 
normal  in  the  example  used. 

Although  we  assumed  the  perpendicularity  between 
the  planet  in  the  two  cases  mentioned  above,  we  can  ap¬ 
ply  this  method  even  if  the  dihedral  angk  k  not  a  right 
angle.  If  the  angle  is  given  to  be  #j ,  the  other  plane’s 
normal  must  be  is  the  belt  that  makes  an  angle  #5  with 
the  given  normal.  We  can  again  form  the  consistent  range 
by  taking  the  intersection  of  th*  belt  and  the  area  for  th* 
VP.  These  constraints  can  also  be  applied  to  more  than  two 
planes,  for  example  when  three  planes  meet  in  a  trihedral 
angle. 


IV.  THE  UMA8S  IMAGE  UNDERSTANDING 
ARCHITECTURE  PROJECT 

I'M  as*  is  designing  and  constructing  a  highly  parallel 
architecture  for  computer  vision  with  the  goal  of  achieving 
real-time  processing  rates  for  low,  intermediate  and  high 


J  V.  •  .  * . 


level  image  interpretation  tasks.  This  architecture  consist* 
of  three  tightly  coupled  layer*  that  correspond  to  these  lev¬ 
el*  of  abstraction.  These  layer*  are  the  Content  Addressable 
Array  Parallel  Processor  (CAaIT)  at  the  LuIUmu,  laUrme- 
diate  and  Communications  Associate  Processor  (ICAP)  in 
the  middle,  and  ths  Symbolic  Processing  Array  (STA)  on 
top.  Attached  to  the  SPA  is  a  host  processor. 

The  CAAPP  is  an  associative  square  grid  processing 
array  that  is  designed  to  provide  bi-directional  parallel  com¬ 
munication  between  symbolic  sensory  processing  [WEE83, 
WEE84,  LEV84).  The  ICAP  is  also  an  associative  square 
array,  and  is  tightly  coupled  to  the  CAAPP  and  SPA.  Tbs 
purpose  of  the  ICAP  is  to  perform  intermediate  level  sym¬ 
bolic  processing  and  to  facilitate  the  flow  of  information 
and  control  between  the  CAAPP  and  SPA.  Tbs  SPA  is  an 
array  of  processors  which  perform  high  level  symbolic  pro¬ 
cessing  such  as  hypothesis  generation  and  testing,  schema 
processing,  and  knowledge  source/blackboard  processing. 

The  multilayer  associative  structure  of  the  UVfast  ar¬ 
chitecture  provides  simultaneous  parallelism  at  three  differ¬ 
ent  levels  of  abstraction  with  high  bandwidth  bi-directional 
flow  of  information  and  control  between  the  levels.  This  per¬ 
mits  the  entire  iconic  to  symbolic  transformation  process  to 
take  place  within  the  architecture  so  that  the  top  layer  cm 
provide  a  high  level  symbolic  interface  to  the  image  inter¬ 
pretation  process.  At  this  level,  images  of  tbs  environment 
have  essentially  been  transf<»m*d  into  a  symbolic  represen¬ 
tation  of  that  environment. 

The  effort  involves  a  custom  VLSI  implementation  for 
the  processing  element*  in  tbs  bottom  two  layer*  of  the  ar¬ 
chitecture;  a  system*  hardware  implementation  for  integrat¬ 
ing  the  custom  processors  with  off-the-shelf  components  in 
the  top  layer  ar.d  host  processor,  a  software  development  for 
creating  a  complete  programming  environment,  simulator* 
and  tools  for  the  system;  and  an  algorithms  effort  for  im¬ 
plementing  vision  algorithms  on  the  architecture.  Much  of 
the  hardware  effort  and  part  of  the  software  and  algorithms 
efforts  will  be  shared  with  Hughes  Research  Labs. 

IV.  1.  Hardware: 

A  test  chip  of  the  NMOS  version  of  the  CAAPP  pro¬ 
cessing  element  has  just  been  received  from  the  MOSIS  fa¬ 
cility.  We  are  currently  preparing  to  test  this  chip. 

The  layout  for  a  CMOS  version  of  the  CAAPP  pro¬ 
cessing  element  is  about  60  percent  complete.  We  will  be 
examining  tbe  tradeoff*  involved  in  going  to  a  CMOS  imple¬ 
mentation.  Although  CMOS  would  increase  tbe  sis*  of  tbs 
layout,  it  would  permit  tbe  use  of  tbe  MOSIS  sealeabl*  rales, 
with  a  potential  for  significant  sise  reduction  and  speed  in¬ 
crease  in  the  future. 

The  first  pass  on  the  design  for  the  Intermediate  and 
Communications  Associative  Processor  (ICAP)  has  been 
completed.  Unfortunately,  to  place  this  ICAP  design  on 
tbe  same  chip  as  tbe  CAAPP  cells  will  necessitate  a  greater 
number  of  pins  than  is  currently  available  from  MOSIS.  We 


are  thus  examining  the  tradeoffs  of  reducing  the  function¬ 
ality  of  the  ICAP  to  maks  it  fit  the  pin  limitations,  versus 
placing  the  ICAP  on  a  separate  chip.  Tbe  latter  would 
double  the  sise  of  the  prototype  circuit  beards,  bu*  would 
provide  greater  processing  flexibility. 

IV .2.  Software  and  Algrithms 

We  are  currently  constructing  an  instruction-level  func¬ 
tional  simulator  for  tbe  new  CAAPP  architecture.  This  sim¬ 
ulator  promise*  to  provide  considerably  greater  execution 
speed  than  the  old  simulator.  The  new  simulator  is  being 
written  in  C  as  a  stand-alone,  portable  system  although  its 
image  formats  will  be  compatible  with  tbe  UMsss  VISIONS 
system.  Once  the  ICAP  architecture  is  finalised,  it  will  also 
be  incorporated  into  the  simulator. 

An  iconic  to  symbolic  transformation  process  has  been 
developed  and  tested  for  the  CAAPP,  using  a  version  of 
the  VISIONS  system  to  simulate  tbs  new  CAAPP  architec¬ 
ture,  prior  to  construction  of  the  sew  simulator.  Several 
vision  algorithms  have  been  implemented  in  the  simulator; 
thee*  include  an  algorithm  for  computing  approximations 
to  large  Gansaian  convolutions,  the  Buns’  line  extraction 
algorithm,  and  the  line  grouping  algorithm  described  in  Sec¬ 
tion  mi. 

A  prototype  slice  of  ths  UMass  architecture  is  sched¬ 
uled  for  completion  in  approximately  2  years.  This  will 
produce  a  symbolic  representation  of  region  and  line  im¬ 
age  events,  a*  well  as  surfaces,  sad  can  b*  interfaced  to 
a  LISP  processor  as  a  demonstration  of  the  concept.  The 
complete  prototype  could  b*  built  in  two  additional  year*. 
At  the  end  of  the  first  year  the  software  effort  will  pro¬ 
duce  simulators  and  tools  for  the  bottom  two  layers  of  the 
architecture.  The  second  year  of  ths  software  effort  will  re¬ 
sult  in  a  transportable,  stand-alone  simulator  for  ths  entire 
architecture  with  associated  environment  and  tools.  After 
this,  the  software  effort  will  concentrate  on  implementing 
vision  processing  tasks  on  ths  simulators  and  then  trans'rr- 
ring  those  implementations  to  the  hardware  as  it  becomes 
available.  Additional  enhancement*  to  ths  environment  and 
further  tools  will  be  developed  as  necessary. 

REFERENCES 

[ADI8Sa|  G.  Adiv,  ‘Determining  3-D  Motion  end  Structure 
from  Optical  Flow  Cenerated  by  Several  Moving  Objects,* 
IEEE  7>sim.  Putitnt  Anal.  Moclun*  InUU.,  Volume PAMI- 
7,  July  1985,  pp.  384-401. 

( ADlSSbj  G.  Adiv,  ‘Interpreting  Optical  Flow,*  Ph.D.  Dis¬ 
sertation,  Computer  add  Information  Science  Department, 
University  of  Massachusetts  at  Amherst,  September  1965. 

[ANA84|  P.  Anandan,  ‘Computing  Dense  Displacement 
Fields  with  Confidenc*  Measures  in  Scenes  Containing  Oc- 
-  .»ion,‘  SPIE  IrdtUigtrd  Robot*  sad  Computer  Vuion  Co*- 
ftrtnct.  Volume  52!,  1984,  pp.  184-194;  also  DARP/  IU 
Workihop  Proceeding*,  1984;  and  COINS  Technical  Re^  irt 
84-32,  University  of  Massachusetts  at  Amherst,  December 
1984. 


f*.V 


•f 


[ANA85]  P.  Anandan  md  R.  Wei**,  ‘Introducing  a  Smooth¬ 
ness  Constraint  in  a  Matching  Approach  for  th«  Computa¬ 
tion  of  Optical  Flow  Field*,*  Proc.  of  the  Third  Workshop 
on  Computer  Vuton;  Representation  and  Control,  October 
iSSS,  pp.  1(16-106;  alw  in  DARPA  10  Workshop  Proceed¬ 
ings,  108S. 

C3AR82j  S.T.  Barnard,  “Interpreting  perspective  imag«,* 
SRI  Technical  Note  271,  1982. 

[BEL8SJ  R.  Belknap,  E.  Rjaeman,  and  A.  Hanson,  The 
Information  Fusion  Problem  and  Rule-Based  Hypotheses 
Applied  To  Complex  Aggregation*  °f  Image  Events,  Proc. 
DARPA  IU  Workshop,  Miami  3each,  FL  December  1085. 

(BHA85|  R.  Bharwani,  A.  Hanson,  E.  Riseman,  Refinement 
of  Environmental  Depth  Maps  over  Multiple  Frames,  Proc. 
DARPA  IU  Worksop,  Miami  Beach,  FL.  December  108$. 

|BURS4)  J.B.  Burns,  A.  Hanson,  and  E.  Riseman,  Extract¬ 
ing  Linear  Features,  Proc.  7th  ICPR,  Montreal,  1984.  Also 
COINS  Technical  Report  84-29,  August  1984.  To  appear  in 
IEEE  PAML 

[CA.N83]  J.F.  Canny,  Finding  Edges  and  Lines  in  Images, 
MIT  AI  Lab  Technical  Report  No.  720,  June  1983. 

IDEM68|  A.P.  Dempster,  A  Generalisation  of  Bayesian  In¬ 
ference,  Journal  of  th  t  Royal  Statistical  Society,  Series  B, 
Vol.  30,  1968,  pp.  2M-247. 

|GLA83]  F.  Glaser,  G.  Reynold#,  and  P.  Anandan,  Scene 
.-  latching  by  Hierarchical  Correlation,  Proc.  IEEE  CV PR, 
Junt,  1923,  pp.  432-440. 

jHAN78|  A.  Hanson  and  E.  Riseman,  VISIONS:  A  Com¬ 
petes  System  for  Interpret** f  Scenes,  Computer  Vision  Sys¬ 
tem*  (A.  Hanson  and  E.  Riseman,  eds.)  (1978),  303  -  333, 
Academic  Press. 

(flAN83|  A.  Hanson  and  E.  Riseman,  A  Summary  of  Irrupt 
Understanding  Research  at  the  Untrerntf  of  Aiauachnetttt, 
COINS  Technical  Report  83-3$  (October  1983),  University 
oi  Massachusetts  at  Amherst. 

|HAR84|  R.M.  Haralick,  Digital  Step  Edges  from  Zero  Cross¬ 
ing  of  Second  Directional  Derivatives,  IEEE  Trans  PAM  6, 
January  1984,  pp.  $868. 

(HOR81|  B.K.P.  Horn  and  B.A.  Sctunck,  ‘Determining  Op¬ 
tical  Flow,*  Arttficiel  Intelligence,  Volume  17,  198^,  pp. 
185-203. 

[KIT80)  L.  Kitchen  and  A.  Roeeufeld,  A  Gray  Level  Corner 
Detector,  Tech.  Report  No.  887,  Computer  Science  Center, 
University  of  Maryland,  College  Park,  MD,  1980. 

[LAW84|  D.T.  Lawton,  Processing  Dynamic  Image  Sequences 
form  a  Moving  Sensor,  Ph.D.  Dissertation  (TR  84-05),  Com¬ 
puter  and  Information  Science  Department,  University  of 
Massachusetts,  1984. 

(LES75|  V.R.  Lesser,  R  D  Fennell,  L.D.  Erman,  and  D.R. 
Reddy,  Organ'-<ation  of  the  Hearsay-Q  Speech  Understand¬ 
ing  System,  IEEE  Trans,  on  ASSP  23,  pp.  11-23. 


[LEV84]  S.P.  Levitan,  ~  iraUel  Algorithms  and  Architec 
tures:  A  Programmer*  Perspective,  Ph.D.  Dissertation 
(COINS  Technical  Report  84-11),  Computer  and  Inform  v 
tios  Science  Depart  men.,  University  of  Massachusetts,  May 
198,. 

[LOW82]  J.  Lowrance,  Dependency  Graph  Models  of  Evi¬ 
dential  Support  Ph.D.  Thesis,  University  of  Massachusetts, 
Amherst,  1982;  also  COINS  Technical  Report  No.  82-26. 

[MAR80]  D.  Marr,  and  E.  Hildreth,  Theory  of  Edge  Detec¬ 
tion,  Proc.  of  the  Royal  Society  of  London,  B.,  207,  pp. 
187-217. 

[NAKS0J  H.  Nakatani,  et  aL  ‘Extraction  ot  vanishing  point 
and  its  application  to  scene  analysis  based  on  image  se¬ 
quence, *  5th  Ini  Coni,  os  Pattern  Recognition,  pp.  370- 
372,  1980. 

(NAK84a|  B.  Nakatani,  T.  Kitahashi,  “Inferring  3-d  shape 
from  line  drawings  using  vanishing  points,*  1st  Intnl  Cool, 
on  Computers  and  Applications,  1984. 

[NAK84b]  H.  Nakatani,  R-  Weiss,  and  E.  Rjaeman,  ‘Ap¬ 
plication  of  Vanishing  PomU  to  3D  Measurement,*  Proc. 
SPIE,  VoL  507,  198.,  pp.  164-168. 

(PARBOj  C.C.  Parma,  A.R.  Hanson  and  EJ4.  Riseman, 
Experiments  t*  Scheme- Dries n  Interpretation  of  e  Netnrel 
Scene,  COINS  Technical  Report  80-10  (April  1960),  Uni¬ 
versity  of  Massachusetts  at  Amherst. 

[PAV8S]  L  Pavlin,  A.  Hanson,  and  E.  Riseman,  Analysis  ol 
an  Algorithm  for  Detection  of  IVattslattonal  Motion,  Proc. 
DARPA  IU  Workshop,  Miami  Beach,  FL,  December  196$. 

[REYSSj  G.  Reynolds,  D.  Strmhman,  N.  Lehrer,  Converting 
Feature  Values  to  Evidence,  Proc.  DARPA  IU  Workshop, 
Miami  Beach,  FL,  196$. 

!SHA76]  G.  Shafer,  A  Mathematical  Theory  of  Evidence, 
Princeton  University  Press,  1976. 

[STR84]  T.  Strat,  Continuous  Belief  Functions  for  Eviden¬ 
tial  Reasoning.  Proc.  AAAl-84.  pp.  303-313. 

|ULL79)  S.  U liman,  The  Interpretation  of  Visual  Motion, 
MIT  Press,  Csmbridge,  MA  1979. 

[WECS3)  C.  Weems,  S.  Levitan,  D.  Lawton,  and  C.  Foster, 
A  Content  Addressable  Array  Parallel  Processor  and  Soms 
Applications,  Proc.  DARPA  IU  Workshop,  Ailingtoa,  VA, 
June  1983. 

|WEE84|  C.  Weems,  S.  Levitan,  C.  Foster,  E.  Riseman, 
D.  Lawton,  A.  Hanson,  Development  and  Coutructiok  of  a 
Content  Addressable  Array  Parallel  Processor  (CAAPP)  for 
Knowledge-Based  Image  Interpretation,  Proc.  Workshop 
on  Algorithm-Guided  Parallel  Architectures  for  Automatic 
Target  Recognition,  Leesburg,  VA  July  1818,  1984,  pp. 
329-359 

[\VEI85|  R.  Weiss,  A.  Hanson,  and  E.  Riseman,  Geomet¬ 
ric  Grouping  of  Straight  Lines,  Proc.  1985,  DARPA  IU 
Workshop,  Miami  Beach,  FL,  1985. 


[WES82]  L.  W esley  ud  A.  Hinson,  The  Use  din  Evidential- 
Based  Mods!  for  Representing  Knowledge  and  Reasoning 
about  Image*  in  the  VISIONS  System,  Proc.  Workshop  on 
Computer  Vision,  Rindge,  NS,  August  23-25,  1062. 

[WES 83]  L.  Wesley,  Rea*oning  about  Control:  The  Invee- 
tigation  of  ru  Evidential  Approach,  Proc.  8th  HCAI,  Kari- 
srube,  West  Germany,  August  1983,  pp.  203-210. 

[WES85|  L.  Wesley,  Ph.D.  Thesis,  University  of 
Massachusetts,  Amherst,  in  preparation. 

[WEY83]  T.R.  Weymouth,  J-S.  Griffith,  A.R.  Hanson  and 
E.M.  Riseman,  RoU  Bcttd  StnUfU*  for  finsf*  ImUrprtU- 
Non,  Proc.  of  AAAAJ-83  (August  1983),  429-432,  Washing¬ 
ton  D.C.  A  longer  version  of  this  paper  appears  in  Proc.  of 
the  DARPA  Image  Understanding  Workshop  (June  1983), 
193-202,  Arlington,  VA. 

[WEY85|  T.E.  Weymouth,  Using  Object  Descriptions  in  n 
Schema  Network  For  Machine  Vision,  PhJ>.  Dissertation 
(is  progress),  Computer  and  Information  Science  Depart¬ 
ment,  University  of  Massachusetts,  Amherst. 

[WOO  78]  W.A.  Woods,  Theory  Formation  and  Control  inn 
Sfjeech  Understanding  Syjtem  with  Extrapolation  Towards 
Virion,  in  Computer  Vision  Systems  (A.  Hanson  and  B. 
Riseman,  Eds.),  Academic  Preen,  1978. 


ICDR  J.  D.  McKcndrick 
Matthew  Lebanon 

NavaJ  Ocean  Research  and  Development  Activity 
Code  321 

NSTL,  Mississippi  39529-5004 


ABSTRACT 

Satellite  imagery  ol  the  oceans  is  adding  synoptic  coverage  to  in 
situ  measurements  ol  traditional  oceanography.  Except  lor  SEASAT 
and  the  Navy  GECSA  I",  almost  all  remote  measurements  have  been 
made  from  weatlier  sati-llms.  In  a  lew  years,  the  Navy  N  ROSS 
satellite  is  to  place  in  orbit  loui  microwave  instruments  designed  to 
mulct  oceanographic  measurements.  The  deluge  ol  oceanographic  data 
prosidcd  by  N  ROSS  and  olher  satellites  will  pose  serious  problems 
lor  operational  interpreters,  which  will  he  added  to  those  caused  by 
the  impracticably  ol  automated  interpretation  and  the-  uneven  quail 
tv  of  human  mterpretatKin.  Expert  systems  that  incorporate  a 
knowledge  base  and  inlerence  mechanism  oiler  a  possible  viiution 
to  the  dilemma.  Expert  systems  may  be  able  to  ads  i\e  inexperienced 
interpreters  and.  eventually,  may  lead  to  automation  ol  the  advise- 
n.ent  process.  A  study  of  the  problem  area  recommended  the  :*n 
piemen tal am  of  a  prototype  expert  system  that  knows  atxxit  mesoscaie 
ocean  features.  That  system  eould  he  u  v-u  10  organize  knowledge 
a  Is  >ut  oceanographic  image  understanding,  which,  in  turn,  could  Is. 
used  to  develop  a  more  powerful  expert  system. 


INTRODUCTION 

Tradition,.!  oceanography  has  relied  on  in  situ  instruments  - 
t hirmi  mcte-rs.  salinomen-rs.  conductivity  meters,  etc  -  to  measure 
iHiamc  parame-ters.  While  those-  measurements  typically  have  pro 
vietid  data  as  a  tune  non  of  depth,  it  has  been  dittieult  to  make-  horizon 
tal  measurements  at  a  dose  enough  spatial  and  temporal  sparing  to 
reveal  time  varying  mesoscaie  teatures.  Satellite  observations  have 
added  that  new-  horizontal  dimension.  Those-  observations,  in  par 
ocular,  give  information  about  the  curtail-  laser  ol  the  ocean,  while 
the-ir  synoptic  coverage-  often  provides  key  indieators  eoneerning 
.xe-unographii  mesoscaie  changes.  Satellite-  imagi-rv  allows  (or  a  mor 
eomple-ti-  understanding  and  large-  -a ale  erntext  for  the  ineorpota 
non  ol  isolated,  in  situ,  measurements. 

A  great  deal  ol  satellite  oee-anography  has  used  in.rared  ilR >  and 
visible  imagery  Irom  mcti-oroiogii.il  satellites  While  the  Conors  in 
those-  sa  lines  were  not  designed  lor  oceanographic  measurements, 
and  J-spite  problems  die  It  as  cloud  cover  and  signal  attcniutiun  bv 
t he-  atmosphere,  txcanogrjphers  hjvc  made  god  use-  ol  ihe  observa 
lions.  A  lew  satellites  have  Ixvn  de-signid  spenlk alls  lor  cxe-anc -graphs 
Among  ihew-  are  SliASAT.  which,  unlortiinati  lv.  tailed  about  three 
months  alter  Ijunch  in  I'J’M  and  ihe  L\  S  Navv  (>EOh.-\T.  which 
placed  a  microwave  altimeter  in  orbit  in  March  IVNi. 


The  Navv  Remote  Ocean  St-aung  System  (N  ROSS)  Ls  lo  be  another 
dedicated  ocean  sensing  satellite  and  is  scheduled  f.-r  a  i990  launch. 
It  will  carry  four  microwave  instruments  to  make  all-weather  ocean 
observ  ations.  Those  instruments  are  an  altimeter,  a  low  frequency 
microwave  radiometer  ilT.MR),  a  scatterometer  (NSCAT)  .  anil  a 
special  v-nsor  mitrowaveiimagcr  ySSMI).  The  ]  ritmry  ocean 
par,.m-ters  to  be  measured  by  each  sensor  Include 

•  altimeter—  sea  surtace  dynamic  topography  (mesoscaie  teatures), 

•  Lh.MR  sea  surface  temperature, 

•  NSCAT  -surface  windspetd  and  direction, 

9  vSM.1  —  Windspcrd  and  sea  ice. 

iNote  Ihe-  altimeter  and  NSCAT  are  nons  anning'nonimagmg,  but 
the  LEMR  and  SSM/1  are  scanning/imagmg.  i  The  st'eam  of  data  from 
N  ROSS,  when  added  to  that  from  other  satellites  (including  other 
pianrwd  new  satellites),  will  inundate  the  oceanographic  remote  sens¬ 
ing  analyst  with  data. 

At  preseni.  the  Navy  must  use  a  human/machine  mix  to  exploit 
satellite  data  to  provide  oceanographic  information  to  the  Fleet  But. 
human  interpretation  ls  of  uneven  quality  and  ts  labor-intensive. 
Bei.iusc  ol  the  subfe-c  use  nature  of  the  human  interpretaiive  process 
and  the  varying  skdl  levels  ol  the  mtcrprctei  it  dearly  maK .  -e-nse 
to  strive  to  vtandardi/e  and  to  optimize  the  inte-rprc-iation  function. 
Ellen  the  quality  ol  ihe  produets  would  lie  less  sensitise  to  inex¬ 
perience.  latiguc.  and  similar  factors.  It  would  be  i  se-ful  to  transfer 
existing  laboratory  expertise  in  image  enhancement  tethnique-s  and 
other  machine  aids  to  the  operational  interpreter  in  the  field.  This 
would  lx-  helplul  in  operational  centers,  whic  h  must  meet  operational 
-a  hedules.  sometime-s  with  inexperienced  pnsonne-l. 

Conventional  automated  techniques  I  or  satellite  data  interpretation 
le  g  .  s-andarj  signal  prcxessing  techniques,  image  vgmentation  and 
tlavsiheationl  do  not  significantly  lend  themselves  to  satellite 
ixeanography  lor  HI  oral  reasons.  z\moiig  these  are:  ixcan  fi-atures 
are  *ime  varying,  no  c Ilmen:  mjthe-matieal  tharaitm/ation  of  tile 
teatures  generally  exists,  and  images  are  frequently  tloud covered  or 
otherwise  eontjminjted  wuh  noise-  (I).  An  example  ol  the-  latter  prob¬ 
lem  is  that  ihe  curtate  thermal  signature  (in  IR  imagery)  ol  a  told- 
tore-  eddv  inav  lx-  masked  by  *olar  he-Jting  ot  the  seriate  layer  or 
obviired  bv  a  humid  marine  Ixiundjry  layer 

I  lowever.  at  NORD.-X  we  have  lound  tlut  inte-rprctation  bv  human 
e»|xrts  living  interae  nve-  image  pnxe-sving  techniques  frequently  works 
well  despite  these  problems.  It  also  jppears  thjt  an  automated  ap¬ 
proach  that  uses  methods  similar  to  those  ol  human  experts  would 
overt  ome*  fv ith  tin  pr<  ildem-.  n*  tuns ent tonal  automated  anclvvtv  and 
the  pri 'blemv  <t|  human  interpretation  d.se  uvsed  jlxne.  Ehts  suggests 
that  the  prospect  lor  an  ixejnoi'rjphii  image  analyst  expert  sv.ti-m 


bl 


otfers  a  potential  solution  to  the  problem  of  operationally  obtaining 
oceanographic  infor.-naunn  from  satellite  observations. 

Some  potential  payoffs  of  the  application  of  expert  systems 
technology  m  the  above  problems  are  the  expectation  of  raising  the 
performance  of  less  experienced  analysts  to  expert  level,  and  ol  the 
longer  term  possibility  that  a  computer  program  may  ultimately  be 
able  'o  replace  the  human  analyst.  The  latter  gojl.  if  achieved,  would 
replace  the  current  subjective  nature  n!  ocean  image  interpretation 
and  make  it  objective. 

Figure  1  is  a  high  resolution  (1.1  km  x  !.t  km  pixel)  NOAA-7 
1R  (channel  1)  imjge  ol  a  portion  of  the  Gulf  Stream  area  southeast 
from  Nantucket.  Viewing  conditions  were  particularly  clear,  and 
following  a  linear  contrast  enhancement  the  image  has  been  further 
enhanced  by  a  modified  Chen  Krei  edge  enhancement  filter  (2j.  Thanks 
to  the  enhancement,  the  strong  temperature  gradient  between  the 
Gulf  Stream  (to  the  southland  colder  slope  water  mass  (to  the  north), 
as  well  as  the  temperature  signatures  of  several  eddies  and  smaller 
vertices  are  easily  seen.  The  image  graphically  depicts  aspects  of  the 
synoptic  oceanography  of  the  region.  It  is  likely  that  conventional 
image  understanding  techniques  (edge  detection,  segmentation, 
classification)  could  piuvide  adequate  analysis  of  this  image.  Even  so. 
an  experienced  human  interpreter  would  have  the  advantage  of  know¬ 
ing  suc  h  sho.  tents  suc  h  as  "the  approximately  circular  features  south 
of  the  Gulf  Stream  jrc  probably  mesoscale  features  called  eddies,  and 
their  large  centra!  cores  of  slowly  counterclockwise  rotating  water 
are  colder  than  the  surrounding  water  mass  While  existing 
automatic1  image  understanding  might  do  the  job,  human  interpreta¬ 
tion  docs  it  better. 

Figure  2  is  more  typical  or  satellite  IR  images  of  the  ocean.  While 
i;  is  possible  to  recognize  some  of  the  ocean  features  in  the  scene, 
portions  o»  the  image  are  cloud  covered.  Traditional  imjge  process¬ 
ing  techniques  would  experience  failures  at  trying  to  determine  the 
oceanographic  context  of  the  scene.  An  experienced  interpicter— 
who  is  particularly  knowlcdgatile  of  the  region  — would  have  a  much 
easier  time 


I  fun  I  li-li't  LnhaiiciJ  ten  clear  satellite  IR  Cull  Slnam  tmjge 


Figure  2.  Cloudy  satellite  IR  image  of  same  area. 


Considering  the  alxrve  sketch  ol  the  problem.  It  is  possible  to  outline 
some  of  the  characteristics  of  a  hypothetical  computerized  system 
to  support  the  operational  oceanographic  image  analyst.  Rather  than 
performing  large  scale  numerical  calculations  to  solve  problems  of 
a  predeltned  tvpe  in  a  step  by -step  manner,  the  system  should  be  able 
to  employ  past  experience  to  solve  new  problems.  It  should  be  able 
to  draw  conclusions  from  a  store  of  task -specific  knowledge.  It  should 
be  able  to  draw  those  conclusions  principally  through  logical  or  plausi¬ 
ble  inference,  not  necessarily  by  numerical  calculation.  It  should  be 
able  to  do  the  type  of  things  which  a  human  expert  does  after  "years 
of  experience." 

NORDA  SATELLITE  IMAGERY  PROCESSING  FACILITY 

The  following  description  presents  the  context  of  NORDA's 
cjpjhilities  (or  working  with  imagery  data  The  NORIOA  Satellite 
Data  Receiving  and  Processing  Facility  (SORI’S)  offers  the  latest  in 
jpjhilitics  for  acquiring  digital  satellite  wan  imagery  who-e  scenes 
may  cover  any  part  of  the  globe  (3j.  Data  signals  can  now  routinely 
he  received  trom  the  following  satellites:  NOA  A  Series  polar  nrbiters. 
Geostationary  Operational  Environmental  Satellites  (GOES),  and  the 
Defense  Meteorological  Satellite  Program  (DMSP).  Imagery  is. 
however,  only  collected  on  request  to  support  ongoing  research 
pro.,  cts. 

lire  Interactive  Dige.il  Satellite  Imagery  Prwssing  System  (IOSIFS) 
consists  of  three  l-S  Model  70  imjge  processing  work  stations  and 
provides  the  NORDA  scientists  with  the  main  image  processing 
eap  ilnlity.  These  systems  jre  used  lor  researc  h  and  the  development 
ol  applications  algorithms. 

The  research  and  development  Ix-ing  conducted  lor  the  GliOSAT 
Program  has  j  separate  l-S  Model  7").  inlet  laced  svitn  j  Gould  SEL 
F2  27.  A  classtv.d  data  transmission  line  Ire  m  JIIL'  API.  brings 
satellite  altimeter  data  into  the  GEOSAT  prixessing  tacilny.  From 
the  Gl.OSAT  altimeter  datj  and  following  considerable  processing, 
the  dynamic  topography  (sulnr.uk  i  of  the  mean  surface  is  resolved. 

A  broad  range  ol  oceanogi aphy  related  interactive  image  analysis 
projects  j'c  being  .onducted  by  the  Remote  Sensing  Branch.  They 


62 


/ 


include  basic  research  at  the  6.1  level  through  advanced  development 
6.3  level. 

PLANNED  WORK 

Interpretation  of  satellite  imagery  requires  familiarity  with  several 
disciplines:  the  physics  and  geometry  of  satellite  remote  sensing; 
characteristics  of  features  of  interest  (features  in  the  region  observed 
and  the  general  ocejnographic  context);  and  data  m  npulation. 
analysis,  and  display  techniques,  li  seems  likely  that  a  successful 
oceanographic  data  processing/interpretation  expert  system  could  assist 
experienced  researchers  as  well  as  inexperienced  workers. 

A  study  for  NORDA  [1]  that  considered  potential  applications  of 
Expert  Systems  techniques  to  the  problem  of  understanding  satellite 
ocean  images  recommended  the  the  development  of  a  system  to  sup¬ 
port  the  interpretation  and  understanding  of  mesoscale  ocean  features 
as  the  application  most  likely  to  produce  results  in  the  near  future. 
The  anticipated  large  increase  in  the  volume  o.'  remotely  sensed  data 
was  cited,  but  another  basis  for  the  recommendation  was  the  recogni¬ 
tion  that  the  aemand  for  more  detailed,  more  specialized  synopses 
is  likely  to  increase.  Consequently,  since  processing  the  relevant  in¬ 
formation  to  provide  timely  synopses  at  the  desired  level  of  detail 
will  require  a  high  degree  of  automation,  coordination,  and  exper 
tisc.  this  area  was  selected  as  a  candidate  (or  expert  systems 
development. 

The  nature  of  the  problem  is  sufficiently  complex  that  develop¬ 
ment  is  likely  to  be  required  in  areas  other  than  expert  systems  per 
se.  Three  other  areas  are  involved; 

•  Image  sequence  analys.s— the  processing  ol  multi-temporal  im¬ 
age  sequences  of  a  given  area  in  order  to  hetter  undei  stand  the 
time  and  space  evolution  of  ocean  features  of  interest.  Not  only 
must  individual  images  he  processed,  but  additional  information 
of  a  sequential  nature  must  be  extracted  (sum  information  may 
make  the  processing  of  new  images  simpler  or  more  reliable). 

•  Multi  sensor  data  integration-  a  requirement  to  ensure  that  in¬ 
ferences  about  features  are  bused  on  consistent  processing  of  all 
data,  imagery  and  other  types.  The  combination  of  detailed  along- 
traek  coverage  provided  by  altimetry  and  the  synoptic  view  pro 
sided  bv  1R  imagery.  a«  recommended  by  l  eituo.  I  luung.  and 
Parra  [3]  jnd  as  is  now  being  used  at  NORDA  in  the  Navy 
GEOSAT  Ocean  Applications  Program  (6).  is  an  example. 

•  Image  understanding  the  processing  of  individual  images  lo  oh 
tain  a  decomposition  into  ocean  features  as  well  as  the  process 
ing  of  image  sequences  to  develop  a  representation  of  the  es  olu 
tion  ol  those  features. 

The  system  recommended  for  development  is  one  intended  to  sup 
port  the  analysis  ol  evolutionary,  structural  changes  in  features,  suih 
as  ocean  fronts  ant1  eddies  in  an  area  of  the  ocean,  by  processing 
a  series  of  regl'.icrcci  images  and  other  sensory  information  from  that 
area.  In  that  prototype  interact  vc.  sc-miautom  in  tl  processing  is  to 
he  supported  by  an  expert  system.  Figure  3  shows  the  proposed 
organization  ot  son,  a  system.  The  data  base  c  msists  ol  two  parts: 
a  static  data  base  tSDB)  and  a  dynamic  data  base  <DDBi.  "Ihe  SDH 
would  contain  the  knowledge  lu'e  i facts  and  inference  rules)  ih.it 
would  represent  the  then  current  understanding  ol  the  problem  do 
main.  That  knowledge  base  would  likely  be  i  oiposcd  o|  subjective 
knowledge  ol  experts  plus  quantitative  or  stun  cal  results  of  studies, 
suc  h  as  studies  ol  the  spate  time  structure  and  s ariahiluv  ol  s|vcilic 
water  masses.  The  DUB  would  contain  the  th-.-n  mrrciu  tails  about 
the  area  under  investigation,  plus  a  model  ol  the  area  rh.it  would 


OATAJASE  (UBt 


Figure  3  Organi.aiton  o)  prototype 


describe  suspected  features,  boundaries,  etc.,  and  information  on  the 
evolution  ol  dynamic  processes.  The  expert  system  would  contain 
the  "inference  engine"  to  apply  inference  rules  tfron  the  SDB)  to 
information  in  the  DOB;  the  results  ol  the  application  rules  would 
be  new  (acts  in  the  DDB.  In  the  initial  system  there  would  be  no 
direct  connection  between  the  image  processing  workstation  and  the 
expert  system.  A  later  version  is  planned  in  which  the  'wo  would 
be  coupled 

The  initial  system  will  provide  a  prototype  for  experimentation 
and  development  of  techniques.  The  experimentation  will  reveal  flaws 
in  the  initial  set  of  inference  rules,  and  will  help  in  the  organization 
of  oceanographic  knowledge  into  a  more  efficient  form  (or  system 
implementation.  The  acquisition  ol  new  knowledge  in  ocean  image 
interpretation  and  its  subsequent  use  in  die  system  will  make  it  possible 
to  enhance  the  prototype.  By  a  " lvxitstr.jp"  approach,  it  will  be  possi 
hie  to  develop  progressively  more  refined,  more  sophisticated  system. 
That  later  system  should  make  possible  more  automated,  less  interac 
live  image-sensor  data  processing.  The  development  ol  this  prototype- 
system  w  ill  accomplish  some  of  the  longer  range  goals  set  forth  earlier 
in  this  paper. 

REFERENCES 

I.  l.yh.inon.  M.  and  J  D.  Mt Kendrick.  “Some  Applications  of 
Image  Processing  in  CXeanography."  13th  South  Eastern  Symposium 
on  System  Theory.  March  I6R3. 

J,  Frei.  w  and  C.  Chen  "Fast  Boundary  Detection:  A  Generali 
lion  and  a  New  Algorithm.  IFJili  I'ransaeltoas  on  Computer x.  \ 

C  No.  Id.  (Xtolx-r  1677. 

V  Hawkins.  I  ct  al  "Remote  Sensing  at  NORDA."  Eos  Tr,i ■ 
actions.  Antonian  Geophysical  Union.  Vol.  fit i.  No.  Ji.  pp.  IK’ 
junc  t.  I6SV 

■I.  Thomason.  M.  G.  and  J.  R.  li.  Gxkctt.  Expert  System  Sup 
port  lor  toe  Interpr,  ration  I  ’ndcrslanding  ot  Occan'c  Images  and 
Sensory  l)ata.  Fin.il  Report  Contrast  PON  i«K»l  I  S  I  M  nod").  May 
31.  !'>St 

3  I  eitao.  I)  N.  b.  Ilnang.  and  C.  G.  Parra.  "A  Note  on  the 
Comp.ii .son  i, t  Radar  Altimetry  With  IR  and  In  Situ  Data  lor  the 
Dele,  lion  ot  ih-  (mil  Stream  Surface  Boundaries."  lour 
(is '  piiysieal  fit  s  .  Vol.  S  i.  No.  SS.  pp.  Vffi6  36' 3.  16'6 

(>.  I  tb.inoo,  M  (il-USA  rthiun  Applications  Fro. yam  i(,()AP 
Initial  Data  Pros,  i sing  and  Analysis  Systs  m  I  !  ana  I  lalttal: on 
Plan.  NORDA  reihnn.il  Note  J'd.  Naval  'Cxi  lo -a  ut  h  and 
IX-velopment  Ac  I  is  its.  Ns  H..  Mississippi.  \pi:|  i  -< 


63 


- 1 


The  ESP!  Vimlcn  System 


Thom  c.  Remrick 

Lockhmmd -Georgia.  Company 
Marietta.  Georgia  30063 


Abstract 


The  ESPI  vision  systee  is  an  experiment  to 
determine  the  feasibility  of  achieving  very 
fast  doma 1 n - independent  computer  ipage 
understanding.  The  term  "understanding"  is 
meant  to  imply  the  construction  of  some 
conceptual  representation  that  is  consis¬ 
tent  *;th  both  internalized  world  knowledge 
and  externally  sensed  data.  Data  reduction 
and  modular  design  are  discussed  with 
respect  to  satisfying  real-time  design 
criteria.  A  new  form  of  image  represen¬ 
tation  is  demonstrated  which  features 
bandwidth  reduction,  lossless  reversibility 
(to  a. id  from  pixel  representation),  and 
ease  of  computer  manipulation  and  exploita¬ 
tion  of  global  data  dependencies.  The  use 
of  pr e • a t tent  1 ve  vision  and  focus  of- 
attention  operators  are  shown  to  reduce 
processing  bandwidth  and  the  incidence  of 
bac V cha i m ng .  A  novel  approach  to  the 
implementation  of  realtime  hardware  is 
proposed . 


I.  Introduction 

The  objective  of  the  Experimental  Symbolic 
Processing  of  Imagery  (ESPI)  effort  is  to 
rapidly  prototype  a  system  that  demon¬ 
strates  the  feasibility  of  real-time  image 
understanding.  In  order  to  achieve  this 
zeal  ,  the  ESPI  vision  system  implements 
several  novel  and  original  features.  This 
paper  discusses  some  of  these  features  as 
■-■ell  as  their  philosophical  justification. 
Fapid  prototypes  can  identify  high  risk 
problem  areas  quickly  so  that  the  scope  and 
dimension  cf  a  problem  may  be  assessed. 

The  term  "image  understanding",  as  used  in 
this  paper,  :s  defined  as  the  construction 
of  corceptuai  representations  or  models 
which  are  consistent  with  both  internalized 
-.r'i  Knowledge  and  one  c:  more  two- 
i.*»tsi:na.  imaoe  patterns  being  sensed. 


This  definition  of  "image  understanding*  is 
general  enough  that  human  perception  may  be 
included.  This  definition  does  rule  out 
many  classical  pattern  recognition  systems, 
however.  One  would  have  to  stretch  the 
definitions  of  conceptual  models  and  world 
knowledge  to  include  feature  spaces  and 
discrimination  functions.  Conceptual 
models  might  include  frames,  augmented 
transition  networks,  semantic  nets, 
conceptual  dependency  notations,  or 
semantic  grammars.  While  feature  spaces 
assume  continuous  and  probabilistic  models, 
conceptual  models  do  not.  One  of  the  best 
examples  of  ’understanding*  is  HAJtGI E 
f  S  h  a  7  3  J  and  the  series  of  programs  from 
Yale  that  have  followed  it.  These 
programs,  based  on  Shank’s  Conceptual 
Dependency  (Sha72j  meaning  representation 
language,  create  representations  that  are 
consistent  with  a  natural  language  input 
and  internalized  knowledge  (in  the  form  of 
scripts,  plans,  and  goals). 

This  definition  of  image  understanding 
rules  out  parametric  vision  systems  which 
attempt  to  construct  three  dimensional 
geometric  or  volumetric  models.  Examples 
of  these  approaches  include  photometric 
stereo  or  structured  light.  Volumetric 
models  are  not  conceptual  structures, 
though  they  m..y  use  propagation  of 
constraints  in  order  to  achieve  consis¬ 
tency.  Examples  of  early  image  under¬ 
standing  systems  (in  the  broader  sense) 
include  the  work  of  Roberts  (Rob65),  Guzman 
jGuz68|,  ond  Huffman  and  Clowes  |Clo71]. 

More  recent  image  understanding  systems 
include  the  photo  interpretation  system  of 
Nagao  and  Matsuyama  [Nag80],  VISIONS 
|Han78),  ACRONYM  (Bro79),  and  Mapsee2 
[  H  a  v  6  0  ]  . 

This  paper  does  not  attempt  to  describe  the 
architecture,  the  application,  or  the 
implementation  cf  the  ESPI  vision  system. 

A  few  novel  features,  research  objectives, 
and  preliminary  results  are  presented. 
Section  II  states  explicitly  the  methodo¬ 
logical  perspective  used  in  this  research 
(HaiSS).  Section  III  discusses  three 


e  ?  «• .-  » 


Cl  * 


a r  ' 

V ' 


de  s  i  gi.  pr  .  '  '■ •  .  p .  •  s  - 
E  S  r  I  v .  s  .  "  s  •/  «  T  »  m 
v .  s  .  .  r  v 

p:-sf".ts  •'.:•»  a :  -  a 

r  e  s  _  .  ?  s  •  !«>•.<*. 


ifsn"r  a :  : 


'.jr.-SM,  a  1C 
J  -  Jail  band 
ait  dev.  31 

lea’  . 


II  Hcthodo! og icml  Perspective 

The  3:  a  .  _  t  this  r  e  s  e  a  :  n  .  s  •.  if»< 

•  e;  e  :  a  .  p  t  .  r.  c.f  .ps  ;  ar,  1 

:*ii»  .nifis’.ar.i.ii  wn.  '  app.y  ti  iatii«. 
as  -e .  1  as  art.'.:; a.  s  y  s  ? » a  s .  Tie  ESP! 
visicn  system  is  an  attempt  ?;  ifve.  p  me 
..'.stance  ■_  f  ar.  attilic.a.  dita.i 
ir.deper.Jent  image  ur.  de !  s  ?  a  id  :  r.  q  system 
which  lends  support  tc  ier.>:*.  tr.eir.es 
at:  it  all  nape  unde  r  s  •  and  i  r.  g  sys'»as 

animal,  .r.sect,  kt  ma'hi-e  Scut.es  cf 
r.'.ese  ner.erai  trecfies  .r.ciule  t  n*  phy s i : a  . 
sc.er.ces  ar.d,  tc  a  lesser  eater?,  tr.e 
bicicgica.  sciences.  Bi:.  ;i:al  processes 
and  mecr.ar.isms  are  ccns.dered  .sell,  fit 
supporting  these  general  principles  only 

•  hen  their  p  hy '. .  '  a  1  «  e  •  h  a  r. :  s  a  s  are  under 
stood  and  when  their  purpose  is  understood 
within  the  rcr.tf  it  of  evolution,  e c c  1  .i g i  c a  1 
niche,  and  cultural  interaction. 


III.  Phi  losopbica.1  Basis 

The  development  of  the  ESPI  vision  system 
is  fcased  on  a  few  pragmatic  des.gn  guide 
lines  as  well  as  general  vissc”  principles. 
A  few  of  these  guidelines  ana  pr i no i pi es 
are  discussed  separately  although  they  are 
related. 

All  Solutions  Are  Not  Real  time 

Response  time  constraints  are  an  integral 
part  of  the  image  understanding  problem. 
Image  understanding  systems  will  be  of 
little  use  in  future  applications  if  they 
are  unable  to  operate  in  "real-time". 

There  may  be  many  solutions  to  the  general 
vision  problem  but  very  few  of  those  may 
have  practical  implementations.  To  use  a 
game  playing  analogy,  it  is  understood  how 
to  write  a  program  that  selects  the  best 
move  in  chess  (assume  that  it  is  sufficient 
to  calculate  all  possible  moves  to  thirty 
levels).  We  don’t  know  how  to  implement 
this  program  so  that  we  may  be  assured  of 
getting  an  answer  in  our  lifetime.  Humans 
provide  an  example  of  what  is  possible  in 
complex  tasks  such  as  chess  playing  or 
image  unae r standing  if  one  is  willing  to 
trade  off  guaranteed  optimality  for  speed. 

Development  of  the  ESPI  vision  system  is 
being  guided  by  several  real-time  design 
guidelines.  These  include  1)  nondispersjve 
or  constant -  time  search,  2)  rigidly 
controlled  backward  chaining,  3)  no  non- 


!  .  a  ’  »  a '  3  ?■  fiil'.j  6  a  r  j . . 

-a  -  a  :  e  i  .  r  .  .  j  •»  muy  part  cf  t  a  p  .  J 
;uif  Ci* :  v  • a  -.  ~  .  -  i  ,ys:c«.  as  long  as 

-if  r  *  a  t  .  n  .  .vs  .  s  a  %  s  c  c  .  a  t  e  1 
w.:r  . :  .  Da  a  tafwcticn  is  typical. y 
perfum'd  .r.  feat-re  based  pattern  recog 
n.t.  h  systems  Hid  :an  eperate  very 
fast  tut  val-at.e  information  is  often 
.  o  s  t  .  The  -  -  r.  s  e  q  -  e  r.  r  e  c  t  this  1  n  f  0 1  ma  t  J  '  .1 
.css  is  tc  transform  a  potentially  ever 
ccr.st  :  air.e-j  ..sicn  pr.tlem  .it:  a.',  -nde: 

:  s',  r  a  .  r.e  3  re.  P;ie.  taseg  techr.igjes  are 

net  ..ssy  as  ate  n.:  j:!rl  features  t-t 
they  ate  very  .r.-it.  .(It  at  *«p.:it..',g 
global  data  3'per.der  ies.  Processing  cor.es 
or  pyramids  are  ar.  attempt  to  simplify  the 
np.-itat.on  :t  g..ta.  data  dependencies  in 
pixel  .  i  ler.ted  representations. 

Tohseg-ences  cf  this  approach  include  an 
•  n-rease  .  r.  the  amount  ct  data  that  must  be 
processed  and  the  loss  of  semantically 
important  detail  in  lower  resolution 
processing  levels. 

In  section  IV. 1  a  new  approach  to  image 
representation  is  described  which  features 
high  fidelity  data  compression,  reduced 
processing  bandwidth  for  image  under¬ 
standing  functions,  and  an  ease  in 
exploiting  global  context. 

A  control  strategy  found  in  animal  vision 
systems  for  reducing  processing  bandwidth 
i equ i t ement s  is  the  indexing  function  or 
f  oc-js  -  c  f  at  t  ent  :  on  operator.  In  any 
goal  oriented  vision  function,  it  should 
not  be  necessary  to  process  each  visual 
region  with  the  same  computational  effort 
as  every  otner  visual  region.  As  an 
example,  empty  sky  may  not  receive  the  same 
attention  from  a  working  truck  driver  as 
the  highway  does  even  though  the  sky  is 
within  the  driver's  field  of  view.  Indexing 
functions  in  the  ESPI  vision  system  achieve 
the  reduced  processing  bandwidth  require¬ 
ment  by  sifting  out  only  the  most  relevant 
and  interesting  parts  of  an  image.  Indexing 
functions  are  defined  in  greater  detail  in 
section  IV. 2 

Pr  l  ncip  1  e  of  _jM o d u  1  a r  _De s i qn 

The  existence  of  a  modular  organization  in 
the  human  visual  system  is  suggested  by  the 
apparent  ease  with  which  color  blind  or 
one-eyed  individuals  adapt  to  a  colorful, 
3-D  world.  If  vision  were  net  commonly  an 
0''e  r  •  cons  t  r  a  l  ned  ptoLiem  and  if  human 


i± 


# 


ft 


* 


vision  were  not  modular.  then  one  could  not 
expect  to  aake  sense  {tea  very  staple  line 
drawings . 

David  Karr  ;Bai8H)  suggests  that  modularity 
four,  d  .r.  01  c  g 1  cal  vision  systems  was  a 
prerequisite  to  the  successful  evolution  ol 
the  r.^xin  visual  sysvea.  Since  evolution 
depends  on  a  seiies  ot  .ncreaentai  ar.d 
successful  autatior.s,  eacn  mutation  Bust  fee 
Icca.ized  ic  its  effect.  Otherwise,  auta 
tiens  successful  in  one  area  sight  have  a 
devastating  consequence  on  ether  areas  of 
tr.e  organism.  These  theories  may  be  very 
speculative,  out  that  can  not  diaimsh  the 
.apcrtance  cf  aodular  design  in  creating 
ccap.es  vision  systeas. 

;  n  Older  to  minimize  piccess  feedback,  it 
wcu.d  Le  r-*1-*  tl  to  partition  the  vision 
system  into  independent  context  domains. 
r»-  k  in  isage  understanding  systems  is 

,.r.e rally  accepted  tc  be  a  necessary  tnd 
valuable  feature.  One  example  is  the 
control  of  segment  a 1 1  on  by  higher  level 
processes  I  Han  *  9 , NagB 0 . Na* 84  )  .  But  process 
feedtacx  carries  with  it  a  penalty.  It 
either  introduces  a  delay  in  systea 
response  or  it  causes  synch r on  1 zat 1  on 
errors.  Synch r on  1 zat 1  on  errors  result  when 
feedback  is  computed  from  earlier  imagery 
that  differs  from  the  imagery  being 
af  f ected. 


IV.  ESP  I  vision  System 

Th*  following  three  areas  of  active  image 
under . tand 1 ng  research  are  in  various 
stages  of  development.  The  first  one,  a 
novel  form  of  image  representation,  is 
relatively  mature.  The  second  area 
describes  a  control  mechanism  for  distri¬ 
buted  image  understanding  systeas.  The  last 
area  addresses  the  pragmatic  problem  of 
implementing  complex  A1  software  in  distn 
buted  teal  time  hardware.  Other  active  ESP! 
research  aieas  that  art  not  discussed  here 
include  1)  noniterative,  reversible  medial 
axis  t r snsf or  mat  1 ens .  2)  cons t ant- 1 1  me 
search  <with  respect  to  the  size  of  the 
database),  Ji  rewriting  'nl**  for 
multi -dimensional  grammars,  and  4) 
definition  and  application  of  ‘visual) 
scripts  (after  Shank  1  and  rouL.nes  latter 
Ul  lman,  |  UU83  ]  >  . 

IV.  1  Gmmmrnltsmd  Cyl inter  Plctarm  II— tm 

The  idea  of  the  generalized  con*  or 
cylinder  was  introduced  by  Binfcrd  (BinTl) 
as  a  way  of  representing  shapes  a 
computer.  Other  forms  cf  shape  rvoeciption 
have  been  proposed  also.  A  few  o<  »se 
include  strip  trees  (Bal79),  Pcuin- 
approximat ion  cf  region  edges  |  He Kt  |, 
pyramids  or  reduced  resolution  hierarchies, 
medial  axis  t r ansf orma 1 1 ons  ( Ros66 , Fut66 J  , 


VERTICAL  V2 
LOCATION  X2 ,  Y2 
SECMENT  SI  RADIUS  R2 


RADIUS  R3 
ORDER 


CURVE  C! 


Figure  I.  A  Shape  Represented  by  Two  Cyxels 


66 


ORICINAL 
BINARY  IMAGE 


IMAGE  SKELETON 


INFERRED 

GLOBAL  STRUCTURE 


o 


(VERTEX  LIST) 
(SECMENTLIST) 


RECONSTRUCTED 

IMACE 


CYXEL 

SUPERPOSITION 


LIST 

REPRESENTATION 


Figure  2.  Pixel  to  Cyxe  'ransformatioct 


and  numerous  quadtree  representations.  Each 
of  these  approaches  attempt  to  rectify  the 
problems  that  picture  elements  or  pixels 
have  in  expressing  global  data 
dependenc les . 

We  present  a  new  form  of  image  represen¬ 
tation  (not  description)  that  combines 
features  or  the  medial  axis  t r an f orma t i on 
and  the  generalized  cylinder.  It  is  called 
the  Gereralized  cYliner  p(ic)ture  ELement 
or  gyxel.  Each  gyxel  consists  of  a 
two- pulley/belt  shape,  shown  in  figure  1. 
Gyxels  are  constructed  from  binary  images 
by  performing  a  medial  axis  transform 
(MAT),  parsing  the  skeleton  into  vertices 
and  connecting  segments,  performing  linear 
interpolations  on  the  segments,  removing 
small  topological  errors  or  artifacts  of 
the  MAT,  then  including  interpolated  width 
information  into  the  vertex  list.  This  is 
illustrated  in  figure  2.  An  image  can  be 
quickly  recontructed  from  this  list  format 
to  a  frame  buffer  or  the  bit-mapped  display 
of  a  workstation. 


The  gyxel  provides  a  potentially  lossless 
representation  of  greyscale  images  which 
are  efficiently  stored  and  manipulated  in 
list  notation.  By  adjusting  linear 
interpolation  error  parameters  (for  curve 
and  width  interpolations)  as  well  as  the 
number  of  greyscale  levels  (or  colors  or 
textural  types),  one  may  trade  off 
reconstruction  fidelity  for  bandwidth 
reduction.  An  original  256x256  black- 
and-white  image  and  its  gyxel  reconstruc¬ 
tion  are  illustrated  in  figure  7.  By 
performing  segmentation  labeling  prior  to 
transformation  into  gyxel  format,  indivi¬ 
dual  gyxels  may  be  labeled  with  respect  to 
color,  shade,  or  texture.  LISP  is  the 
development  language  of  choice  for  pe :  - 
form-ng  gyxel  manipulation.  In  addition, 
gyxels,  displayed  on  bitmapped  screens, 
may  exist  as  objects  with  methods  or 
processes  attached  to  them.  A  more 
detail  <•-  treatment  of  gyxels  and  techniques 
for  tneii  manipulation  is  in  preparation. 


a 


Figure  3.  Correction  of  Structure!  Error*  in  Mat 


rj rots  due  to  the  linear  XAT  skeleton 
interpolation  or  the  linear  width  inter* 
lolat  *on  can  be  adjusted  if  one  wishes  to 
:rade  off  image  fidelity  for  bandwidth^ 
rhe  image  is  represented  in  the  form  of  a 
re  rt ex  list  and  a  segment  list.  Each 
rertex  element  of  the  vertex  list  contains 
a  vertex  number,  x  and  y  locations,  order 
(number  of  coincident  segments),  and 
radius.  Each  segment  element  of  the 
segment  list  contains  a  curve  number 
(shared  by  all  segments  of  a  connected 
set),  a  segment  number,  the  originating 
vertex  number ,  the  destinfltion  vertex 
number,  and  a  color/texture  label.  These 
lists  are  easily  manipulated  in  the  LISP 
programming  language. 

The  medial  axis  transform  has  a  major,  well 
documented  flaw  which  may  explain  why  it 
has  gained  so  little  popularity.  Similar 
shapes  differing  only  by  small  edge 
perturbations  or  "holes"  may  account  for 
very  different  skeletal  shapes.  One 
solution  to  this  problem  is  to  spatially 
low  pass  filter  an  image  to  the  point  where 
similar  shapes  will  feature  similar 
skeletons.  This  approach  is  not  only 
information  lossy,  but  often  it  does  not 
work. 


We  have  demonstrated  a  nonlossy  approach 
which  uses  very  simple,  low  level  LISP 
functions  to  infer  global  structure  from 
topologically  different  skeletons.  Figure 
3. a  illustrates  two  star  shapes:  one  of 
the  star  shapes  contains  two  "holes".  The 
resulting  medial  axis  transformation  of 
these  two  shapes  is  shown  in  figure  3.b. 
Note  that  »ven  the  "perfect"  star  does  not 
result  in  a  symmetric  skeleton.  By  applying 
a  series  of  very  simple  LISP  functions  to 
the  segment  and  vertex  lists  representing 
the  skeleton,  it  is  possible  to  infer  a 
five-spur  symmetric  structure.  This  is 
illustrated  in  figure  3.c.  The  reader  is 
referred  to  (EeaA5.2)  for  a  complete 
description  of  this  simple  inference 
process.  Although  the  example  illustrates 
the  reconstruction  of  a  skeleton,  a  gyxel 
reconstruction  would  have  been  very  simple. 


.  1  Indexing  Functions 

indexing  (U1183)  is  an  operation  used  in 
vision  systems  to  shift  processing  focus. 
By  shifting  processing  focus  to  unique 
"odd-man-  out"  locations  in  an  image,  a 
vision  system  is  able  to  formulate  an 
initial  hypothesis  about  the  scene  without 


a 

*. 

¥ 

L 


68 


processing  a  potentially  crippling  amount 
of  :  t  t  e  1  e  v  a  r.  c  3a  t  a.  In  anotr.er  sei.se, 
indexing  functions  select  the  acst 
"interesting"  data.  In  tftis  way,  they 
reduce  •  r.e  ccmpvt at :  cr.ai  derands  placed  cn 
f.atdware.  Because  the  sifted  data  is  sc 
interesting,  initial  hypotheses  are  likely 
to  he  cased  on  the  most  cnaracteristic 
features  of  a  scene.  The  likelihood  of 
itminq  Diseases  may  be  less;  this  can 
result  in  less  backchai n 1 ng . 

The  existence  of  indexing  functions  in 
hu.’ans  is  suggested  by  psychological 
evidence  and  by  physiological  data. 

Indexing  functions  assume  two  very 
different  formsr  task  -  dependent  and 
task ■ independent .  Task  - 1 ndependent  indexing 
has  been  called  preattentive  vision 
f.JulSlj.  It  is  characterized  by  being 
seemingly  instantaneous,  effortless,  and  is 
sensitive  over  a  wide  field  of  view.  The 
perception  of  textural  boundaries  is  only 
one  example  of  preattentive  vision. 
Attentive  or  t a sk  •  dependent  vision,  on  the 
other  hand,  is  goal - di r ect ed .  Goal -di rected 
visual  search  is  serial  (much  slower  than 
preattentive  vision)  and  usually  is 
operative  only  over  a  relatively  small 
field  of  view.  Since  the  term  "focus-of- 
attention’  usually  refers  to  attentive 
vision,  we  have  selected  "indexing*  to 
refer  to  both  attentive  and  pre-attentive 
forms. 

Although  several  image  understanding 
systems  exist  which  use  foe  us  -  of  -  at tent ion 
operators  [ dro82 ,Naz84 ] ,  they  are  goal • 
oriented  or  task -dependent .  The  ESPI 
vision  system  combines  task -dependent  and 
task- independent  focus-of-attention 
operators.  This  provides  multi-level 
bandwidth  reduction  for  low  (task  indepen¬ 
dent)  level  processes  as  well  as  higher 
(task  dependent)  processes. 

One  example  of  preattentive  vision  that 
will  be  integrated  into  the  ESPI  vision 
system  is  the  YATA  texture  discrimination 
algorithm  (Rea85.1).  The  YATA  algorithm 
models  Julesz'  Texton  theory  of  preatten¬ 
tive  vision  [ Jul81  ]  .  It  is  able  to 
discriminate  two  different  textures  having 
identical  Fourier  power  spectra  (and  so, 
identical  intensity  mean  and  variance 
distributions).  This  is  demonstrated  in 
figure  4.  Detection  of  subjective  contours 
by  the  YATA  algorithm  is  illustrated  in 
figure  5 . 

An  early  example  of  two  goal - di rected 
indexing  functions  is  shown  in  figure  6. 

The  unordered  segment  list  and  vertex  list 
are  filtered  with  respect  to  qualities 
which  .might  suggest  "obvious"  roads  or 
buildings.  The  objects  identified  in 


figures  tj.h  and  e  .  c  are  hypotheses  which 
may  then  be  tested  in  a  distributed 
multi  processing  environment  (note:  this 
example  uses  an  early  reconstruction  form 
using  constant  width  gyxels).  Once  these 
hypothetical  objects  are  validated,  they 
are  used  to  suggest  the  location  of  less 
.vieus  roads  and  buildings.  This  incre¬ 
mental  approach  serves  two  purposes:  a  data 
reduction  decreases  ccmputat ional  bandwidth 
and  there  is  less  chance  cf  mistake  or 
backchai ni ng ,  because  the  process  begins 
with  hypotneses  which  are  least  risk  or 
most  obvious  (like  road  shapes  found  in 
aerial  vision  applications). 


IV. 3  Distributed  Artificial  Intelligence 

It  is  our  belief  that  the  mapping  of 
complex  AI-type  functions  onto  general 
purpose  multiprocessing  systems  for 
real-time  applications  is  not  only  risky 
but  unnecessary.  The  overhead  due  to 
inte rprocessor  communication  is  likely  to 
be  much  greater  in  symbolic  processors  than 
it  has  been  in  numerical  processors.  Two 
recent  developments  have  made  the  prospect 
of  custom  hardware  practical  and  cost 
effective.  The  DARPA  sponsored  MOSIS 
circuit  fabrication  process  will  soon  be 
able  to  turn  CAD  designs  into  tested 
silicon  within  30  days.  Object  oriented 
languages  or  functional  languages  provide  a 

software  development  and  simulation 
environment.  Moreover,  once  a  program 
written  in  an  object  oriented  language  is 
debugged,  it  may  be  mapped  directly  into 
virtual  machine/FIRMWARE  CAD  deenptions. 

In  several  years,  it  should  be  possible  to 
go  from  a  software  simulation  to  working 
silicon  in  much  less  than  one  month.  It 
will  be  a  cheaper,  faster,  and  more 
reliable  approach  to  prototyping  than  the 
current  approach.  Since  the  architecture 
is  customized,  there  will  be  much  less 
interprocess  communication  overhead  than 
would  be  found  in  general  purpose  machines. 

The  ESPI  vision  system  is  being  developed 
in  an  object  oriented  language.  This 
language  is  currently  being  prototyped  in 
LISP.  The  Smalltalk-80  programming 
language  [Gol83,  Gol84]  has  served  as  a 
model  for  the  implementation  of  this 
simulation/development  language.  The 
definition  of  the  virtual  machines  and 
implementation  feasibility  studies  remains 
to  be  done.  The  virtual  machines  will 
probably  include  RISC  computers  as  well  as 
simpler  automata.  The  siiu'nlation  (as  well 
as  the  implementation)  will  reelect  many  of 
the  design  issues  promoted  for  data  flow 
machines  |Den81i  and  parallel  architectures 
for  AT  [ Hew6 4 ) . 


Figure  ».  Discrimination  of  Teature*  Having  Identical  Fourier 
Power  Spectra 


_ _ i _ 

Figure  5 


Detection  of  Subjective  Contours 


Figure  6.  Example  of  Simple  Indexing  Functions 


¥: 

Theory  oi  Preattentivc  Vision.  This 

V.  Conclusions  algorithm  demonstrated  the  successful 

detection  cf  subjective  contours  and  the 


In  evaluating  the  feasibility  of  real-time 
image  understanding  systems,  the  ESPI 
vision  system  is  testing  a  variety  of  novel 
approaches  to  shape  and  knowledge  represen- 
tation,  image  understanding  control 
•-  mechanisms,  and  implementation  schemes.  The 
potential  for  additional  technological 
breakthroughs  seems  premising. 

Lj  Significant  breakthroughs  have  been  mode  to 
date.  These  include  the  first  rigorous 
1-7  computer  implementation  of  the  Texton 


discrimination  of  two  different  textures 
having  identical  Fourier  power  spectra 
|Rea8'i.l).  A  syntactical  method  for 
r  jcving  topological  errors  common  to  the 
me  al  axis  1 1 an f ormu 1 1  on  was  also 
demonstrated  |Rea85.2J.  Finally,  this 
paper  introduces  a  lossless  representation 
of  binary  images  which  is  efficiently 
stored  and  manipulated  in  list  structures. 


Figure  7.  25 6  x  256  Pixel  Image  »'id  Cyxel  Reconstruction 


References 

| Bal79 )  D.H.  Ballard,  “Strip  trees:  A 
Hierarchical  Representation  for  Map 
Features,"  PROC.  IMAGE  UNDERSTANDING 
WORKSHOP,  April  1979,  pp. 121-133. 

[ Bin71  )  T.O.  Binford,  "Visual  Perception  by 
Computer",  presented  at  the  IEEE  Syst., 
Sci.,  Cybern.  Conf.,  Miami,  FL,  invited 
paper,  Dec .  1971. 

[ Bro79 ]  R. A. Brooks,  R.  Greiner,  and  T.O. 
Binford  ,  "The  ACRONYM  Model -based  Vision 
System",  PROC.  IJCAI-6,  Tokyo,  Japan, 

August  1979,  pp.  105-113. 

[Bro82]  Roger  A.  Browse,  "Knowledge  -  based 
visual  interpretation  using  declarative 
schemata,"  PhD.  thesis/  Tech.  Rep.  TN 
82-12,  Univ.  British  Columbia,  Dept.  Comp. 
Sci.,  Nov .  1982. 

[Clo71J  M.B.  Clowes,  "On  Seeing  Things", 
ARTIFICIAL  INTELLIGENCE,  Vol.  2,  Issue  1, 
1971,  pp. 79-116. 

( Den84  J  Jack  B.  Dennis,  "Data  Should  Not 
Change:  A  Model  for  a  Computer  System",  MIT 
Laboratory  for  Computer  Science, 

Computation  Structures  Group  Memo  No.  209, 
July  1981. 


[ Gol8 3 ]  Adele  Goldberg  and  Daniel  Ingalls, 
SMALLTALK -R0:  THE  LANGUAGE  AND  ITS 
IMPLEMENTATION ,  Add i son  -  We  si ey ,  Reading, 

HA,  1963. 

[ Gol84 ]  Adele  Goldberg,  SMALLTALK -80:  THE 
INTERACTIVE  PROGRAMMING  ENVIRONMENT, 

Addi son -Neeley ,  Reading,  MA,  1984. 

[ Gux68 )  A.  Caiman, 'Decomposition  of  a 
Visual  Scene  into  Three-  Dimensional 
Bodies",  AFIPS  PROCEEDINGS  FALL  JOINT  COMP. 
CONF.,  Vol. 33,  1968. 

[Haiasi  R.P.  Hall  and  D.F.  Kibler, 
"Differing  Methodological  Perspectives",  AI 
MAGAZINE,  Vol.  6,  No.  3,  Fall  1985,  pp. 

166  - 178  . 

[Han78]  A . R .  Hanson  and  E.M.  Riseman, 
"VISIONS:  A  Computer  System  for 
Interpreting  Scenes",  COMPUTER  VISION 
SYSTEMS,  Academic  Press,  A . R .  Hanson  and 
E.M.  Riseman,  (Eds.),  New  York,  1978,  pp. 
303-333. 

[Hav80]  W.S.  Havens  and  A.;;.  Mackworth, 
"Schemata  -  based  Understanding  of  Hand-Drawn 
Sketch  Maps",  PROC.  THIRD  CONF.  CANADIAN 
SOC.  COMP.  STUDIES  INTEL.,  Victoria, 

Carada,  1980,  pp.  172-178. 


72 


r. a. inj 


[Hew84|  Carl  Hewitt  and  Henry  Liebermar., 
“Design  Issues  in  Parallel  Architectures 
tor  Artificial  I nte 1 1 i gence ’ ,  MIT  AI 
Laboratory,  Memo  No.  750,  November  1983. 

| Mar  82 )  David  Marr,  VISION:  A  COMPUTATIONAL 
INVESTIGATION  INTO  THE  REPRESENTATION  AND 
PROCESSING  OF  VISUAL  INFORMATION,  W.H. 
Freeman,  San  Francisco,  1982. 

[McK85j  D.M.  McKeown,  Jr.,  “Alignment  and 
I  Connection  of  Fragmerted  Linear  Features  in 
|  Aerial  Imagery",  PROC.  IEEE  COMP.  SOC. 

COMP.  VISION  PAT.  RECOG.,  San  Francisco, 
June  1985,  pp.  55-61. 

[Nag80|  M.  Nagao  and  T.  Matsuyama,  A 
STRUCTURAL  ANALYSIS  OF  COMPLEX  AERIAL 
PHOTOGRAPHS,  Plenum  Press.  New  York,  1980. 

[Naz84)  A.M.  Narif  and  M.D.  Levine,  “Low 
Level  Image  Segmentation:  An  Expert 
System*,  IEEE  TRANS.  PAMI,  Vol.  PAMI-6,  No. 
5,  Sept.  1984,  pp.  555-577. 

[Rea85.1]  T.C.  Rearick,  “A  Texture  Analysis 
;  Algorithm  Inapired  by  a  Theory  of 
I  Preattentiv*  Viaion*,  IEEE  COMP.  SOC.  PROC. 
COMP.  VISION  PATTERN  RECOG.  '85,  San 
Franciaco,  CA,  June  1965,  pp.  312-317. 

[ Rea85 . 2 ]  T.C.  Rearick,  “Syntactical 
Methods  for  Improvement  of  the  Medial  Axis 
Transformation",  PROC.  SPIF:  APPLICATIONS 
9  OF  AI  II,  J.F.  Gilmore,  ed . ,  Arlington,  VA, 
Vol.  548.  ( 1985  ),  pp. 110  - 115 . 

[ Rofc65 ]  L.G.  Roberts,  "Machine  Perception 
••  of  Three-dimensional  Solids",  OPTICAL  AND 
ELECTRO-OPTICAL  INFORMATION  PROCESSING, 

■  T  '  ppett  et  al.  (Eds.),  MIT  Press, 

|  Cambridge,  Mass.,  1965,  cr. 159-197. 

(Ros66)  A.  Rosenfeld  and  J.L.  Pfaltz.- 
"Sequential  Operations  in  Digital  Picture 
Processing" ,  JOUR.  ACM,  Vol.  13  (1966),  pp. 
471  -  494  . 

'  (Rut66|  D.  Rutovitz,  "Pattern  Recogntion", 

*  JOUR.  ROYAL  STATIS.  SOC.,  Vol.  129(1906), 
pp .  504  -  530 . 

j  [ Sha72  1  R.C.  Shank,  "Conceptual  Dependency: 

A  Theory  of  Natural  Language  Under- 
-!  standing",  COGNITIVE  PSYCHOLOGY,  Vol  3, 

No.  4  (1972),  pp. 552-631. 

[  Sha7  3  ]  R.C.  Shank,  N.  Goldman,  C.  Rieger, 
and  C.  Riesbeck,  "MARGIE:  Memory,  Analysis, 
Response  Generation  and  Inference  in 
^  English",  PROC.  5D  IJCAI,  1973,  PP. 

V  255-261. 

*  [ U 1 1 8 3 |  Shimon  Ullman,  "Visual  Routines”, 
MiT  AI  Laboratory,  Memo  No.  723,  June  1983  . 


t 


4 


V. 


7  3 


J 


DYNAMIC  ARCHIVAL  SCENE  MODEL 


Hates  N.  Hast 
Raj  K.  Aggarwal 
Durga  P.  Panda 

Honeywell  Inc.,  Systems  and  Research  Center 
Minneapolis,  MM  55441 

ABSTRACT 


This  paper  describes  a  blackboard 
architecture  for  representing  knowl¬ 
edge-base  and  scene  information  for 
outdoor  multiscenario  dynamic  scene 
interpretation.  The  blackboard,  called 
Dynamic  Archival  Scene  Model  (DASM) , 
allows  intelligent  compression  of 
sensed  image  information  into  an  effi¬ 
ciently  structured  fashion.  Scene  con¬ 
text  information  is  also  incorporated 
into  the  blackboard.  The  DASM  provides 
easy  access  to  the  scene  model  from  all 
the  cooperating  processing  levels  of 
the  vision  system  performing  scene 
interpretation.  The  cooperating  func¬ 
tional  modules  can  either  dynamically 
update  the  information  in  DASM  or  ac¬ 
quire  information  from  DASM  to  perform 
their  individual  functions.  The  end 
result  is  an  invariant  representation 
of  the  6cene  dynamics  for  knowledge 
based  interpretation  of  outdoor  tem¬ 
poral  events. 

I.  INTRODUCTION 

Interpretation  of  outdoor  dynamic 
scene  under  variable  scenario  condi¬ 
tions  and  diverse  atmospheric  ambients 
is  difficult  because  dynamic  object  and 
relational  representations  are  diffi¬ 
cult  to  define,  and  low-level  vision 
algorithms  often  provide  ambiguous  or 
incomplete  primitives  from  the  sensed 
image  (Rosenfeld  1984).  This  makes 
dynamic  scene  interpretation  highly 
vulnerable  to  variations  in  the  image 
content  and  quality. 

There  have  been  efforts  to  build 
world  models  that  provide  invariant 
domain  knowledge  for  vision  systems 
(Mackworth  1981).  Brooks  et  al  (1979) 
in  their  top-down  prediction- 


hypothesis-verification  paradigm 
emphasize  object  driven  modeling  rather 
than  data  driven  modeling.  Integrated 
top-down  and  bottom-up  ap-  proaches 
(Matsumaya  1985)  begin  to  ad-  dress  the 
need  for  mapping  of  the  dynamic  image 
representation  to  a  world  model. 
Feldman  et  al  (1978)  and  Nagoa  et  al 
(1988)  use  a  novel  approach  to 
knowledge  directed  image  analysis  in 
aerial  imagery.  Their  model  offers 
encouraging  results,  however,  their 
image  models  assume  a  somewhat  static 
environment  and  ideal  quality  imagery. 
Hanson  et  al  (1980)  discuss  image 
models  as  an  instantiation  to  a  sym¬ 
bolic  world  model  in  what  they  refer  to 
as  short-term  and  long-term  memory. 
The  model  has  many  interesting  capabil¬ 
ities.  However,  the  results  are  sensi¬ 
tive  to  variations  in  the  image  quality 
and  content.  Further  detailed  survey 
of  models  for  vision  systems  can  be 
found  in  (Binford  1982) . 

Reliable  and  rooust  interpretation 
of  outdoor  scene  requires  an  approach 
to  integrating  the  static  domain  knowl¬ 
edge.  with  models  o‘:  temporally  archiv¬ 
ed  distortion  invariant  information, 
extracted  from  image  data  (Fischler 
1973).  We  discuss  here  a  model,  called 
Dynamic  Archival  Scene  Model  (DASM) 
with  emphasiB  on  efficient  image  data 
representation  of  dynamic  and  archival 
information.  Updating,  information 
recovery,  and  integration  of  the  dynam¬ 
ic  information  with  the  world  kr.nwledqe 
are  also  encompassed  functions  in  the 
DASM. 

DASM  provides  a  highly  structured 
and  abstracted  image  model,  built  on  a 
blackboard  architecture  framework. 


Some  key  features  of  this  model  are: 

1.  Symbolic  image  descriptors 
coupled  with  contextual  infor¬ 
mation  compensate  for  incom¬ 
plete  world  model  representa¬ 
tions  and  weak  low  level  vision 
(e .  g .  segmentation  and  edge 
detection)  processes.  Loss  of 
details  in  the  data  is  account¬ 
ed  for  in  the  integration  of 
the  information. 

2.  DASH  creates  a  centralized 
image  da ta-and -knowledge -base 
that  serves  as  an  interrace  for 
all  processing  levels  (low, 
intermediate,  and  high) .  This 
interface  facilitates  top-down 
feedback  cortrol  of  vision 
processes. 

3.  It  provides  a  compatible  repre¬ 
sentation  of  the  dynamics  with 
the  static  world  model.  This 
reduces  the  search  space  for 
semantic  matching  of  relation¬ 
ships  in  the  dynamic  scene  with 
the  expected  models. 

In  the  next  section,  we  first  pre¬ 
view  knowledge  representation  in  the 

static  world  model.  In  the  following 
section,  we  present  an  overview  of 
DASH . 

II.  THE  STATIC  SCENE  MODEL 

The  Static  Scene  Model  (SSM)  con¬ 
tains  static  knowledge  describing  ex¬ 

pected  scene  objects  and  regions.  It 
is  a  schema  representation  of  a  seman¬ 
tic  network  implemented  as  a 

frame-based  structure  where  slots,  in 
the  frames,  correspond  to  attributes 
describing  world  objects  and  relation!. 

The  SSM  describes  regions  and  ob¬ 
jects  expected  to  appear  in  a  specified 
world  domain.  For  example,  roads, 
vehicles,  bridges,  vegetation,  trees, 
sky.  Many  similar  representational 
approaches  have  been  taken  in  the  IU 
community,  only  the  particular  domains 
vary. 

Regions  and  objects  included  in  the 
Static  Scene  Model  share  a  certain  set 
of  common  attributes  that  describe  them 
under  general  conditions.  These  attri¬ 
butes  are  described  in.  contextual  and 


relational  manner.  Shape,  relative 
size,  relative  position  and  texture  are 
examples  of  such  attributes  (or  fea¬ 
tures)  .  These  attributes  are  commo.’:  to 
all  objects.  Figure  1  illustrates  the 
content  of  SSM.  In  the  Static  Scene 
Mode1-  the  attributes  have  expected 
values  that  are  constrained  by  the  par¬ 
ticular  operating  scenario,  such  as 
ground-based  sensor,  airborne  sensor, 
etc. 

These  models  in  the  Static  Scene 
Model  are  generic  objects  and  regions 
schemata,  where  the  attribute  values  in 
the  models  are  a  superset  of  the  ones 
in  the  images.  In  other  words,  the 
image  model  contained  in  the  DASH  is  an 
instance  model  of  the  SSM. 

Knowledge  represented  in  FSM  cor¬ 
responds  to  the  intrinsic,  contextual 
and  relational  characteristics  of  ob¬ 

jects  and  regions.  These  characterist¬ 
ics  are  explored  by  identifying  simi¬ 
larities  differences,  and  uniqueness 
of  these  objects  under  similar  and  dif¬ 
ferent  scenarios. 

Given  a  particular  scenario,  every 
region  or  object  has  a  set  of  geometri¬ 
cal,  physical,  contextual  and  relation¬ 
al  properties  that  is  always  valid, 
under  minor  variation.".  These  differ¬ 
ent  properties  are  described  as 
follows : 

o  Geometry:  Shape  (straight  line 

segments,  quadrilateral,  circu¬ 
lar)  ,  concavity  and  convexity, 
three-dimensional  to  two  dimen¬ 
sional  signature,  perimeter, 
rotation  and  translation. 

o  Physical  properties:  Reflec¬ 
tivity,  absorbitivity ,  inten¬ 

sity,  material  composition,  and 
spectral  characteristics. 

o  Contextual  and  relational  prop¬ 
erties?  Expected  image  loca¬ 
tion,  size,  neighboring 

regions,  in  side-cf  other 
regions,  and  obtaining  other 
regions  or  objects. 

Contextual  information  introduced 
in  the  SSM  directly  relates  to  the 
image  domain.  However,  some  informa¬ 

tion  is  not  derived  from  the  image 


.W.WWWW1  P.W.- 'i". V? p »•  'JJ  •n .»i» 


* 


t 


data.  This  includes  ancillary  informa¬ 
tion  such  as  weather  condition, 
goal-driven  object  search,  and  others. 
For  txample,  not  all  regions  are  pro¬ 
duced  by  the  same  segmentation  algo¬ 
rithm.  The  same  image  is  passed 
through  different  segmentors,  which  are 
specialists  in  extracting  convex  re¬ 
gions,  elongated  regions,  etc.  This 
specific  region  segmentor  information 
is  part  of  the  expected  contextual  in¬ 
formation  incorporated  in  each  object 
and  regional  schema  in  the  Static  Scene 
Model. 

III.  THE  DYNAMIC  ARCHIVAL  SCENE 
MODEL 

The  DASM  consists  of  spatial  and 
temporal  information,  low  level  primi¬ 
tives,  as  well  as  symbolic  and  contex¬ 
tual  information.  The  DASM  also  con¬ 
tains  a  dynamically  updated  historical 
scene  buffer.  Figure  2  shows  the 
system  level  block  diagram.  Regional 
scene  representation  in  DASM  is  illus¬ 
trated  in  Figure  3. 

Local  Schema 

The  DASM  representation  is  based  on 
a  collection  of  schemata  and  is  imple¬ 
mented  in  a  frame-based  data 
structure.  It  contains  attributes 
(called  slots),  similar  to  SSM,  and 
their  corresponding  values  which  de¬ 
scribe  image  region  features  and  their 
relationship  to  other  image  regions  as 
shown  in  Figure  4. 

There  are  two  levels  of  image  in¬ 
formation  abstraction  as  illustrated  in 
Figure  2.  The  first  level  is  a  reorga¬ 
nization  of  low-level  feature  vector 
data  into  a  frame-b<ised  data  structure, 
with  no  assumed  inheritance.  The 
second  level  contains  symbolic  descrip¬ 
tors,  extracted  from  the  first  level, 
represented  in  a  frame-based 
structure.  The  first  level  serves  as 
an  input  to  the  historical  scene  buffer 
where  primal  data  is  used  for  compiling 
temporal  and  spatial  information. 

Every  schema  in  DASM  corresponds  to 
specific  image  region.  The  collec¬ 
tion  of  these  schemata  creates  an  in¬ 
stance  model  of  the  Static  Scene 


Model.  Every  slot  in  each  frame  corre¬ 
sponds  to  an  image  feature  such  as: 
area,  shaoe  measure,  centroid,  loca¬ 
tion,  length-to-width  ratio,  texture 
measure,  etc.  Some  of  the  common 
region  attributes  are: 

TEXTURE  =  SMOOTH,  COARSE,  ETC. 

SHAPE  =  ELONGATED,  TRIANGULAR, 
SQUARE,  CIRCULAR,  IRREGULAR 

RELATIVE  LOCATION  *=  BEHIND, 

IN-FRONT,  NEXT-TO,  ON-TOP-OF, 

UNDER 

ABSOLUTE  LOCATION  »  FAR-AWAY, 
HID-RANGE,  NEAR,  FOREGROUND, 
BACKGROUND 

RELATIVE  SIZE  »  LARGE,  MEDIUM, 

SMALL 

ABSOLUTE  SIZE  »  LARGE,  MEDIUM, 

SMALL 

Some  of  the  above  attributes  are 
the  result  of  associating  and  inter¬ 
preting  two  or  more  numeric  features. 
For  example,  the  AREA  and  RANGE  produce 
ABSOLUTE-SIZE,  as  shown  in  Figure  5. 

Primal  Model 

The  primal  model,  called  Archival 
Scene  Model  (ASM1  is  a  central  database 
that  contains  all  the  relevant  primal 
information  about  the  current  scene, 
such  as  intermediate  image  results, 
range  and  transforms  estimates  as  well 
as  information  pertaining  to  individual 
regions  such  as:  features,  silhou¬ 

ettes,  and  tracking  data.  Multi-frame 
information  at  the  image,  region,  and 
segment  levels  is  also  stored  in  the 
ASM. 


Historic  Buffer 

DASM  contains  archived  information 
from  a  temporal  sequence  of  image 
frames  which  form  the  historic  scene 
buffer.  The  data  is  organized  under  a 
scene  number  and  rc r r espondi ng 
regions.  Figure  6  illustrates  how  such 
a  buffer  is  constructed.  This  histor¬ 
ical  scene  buffer  serves  as  a  reference 
for  object  motion  detection,  predic¬ 
tion,  and  tracking  It  also  provides  a 
mean  for  compiling  temporal  and  spatial 
information. 


DASH  as  an  Interface  f  or__Feedback 

An  interface  is  a  channel  of  com¬ 
munication  between  different  levels  of 
processing,  a  basic  requirement  for 
feedback  control.  DASH  establishes 
that  interface  as  shewn  in  Figure  7. 

The  interface  supplies  information 
and  data  for  feedback  between  different 
levels  of  processing.  The  data  re¬ 
quired  for  the  interface  is  determined 
by  analyzing  the  inputs  and  outputs  of 
the  feedback.  The  feedback  from  mid  to 
low  level  processing  consists  of  la¬ 
beled  regions  and  objects  classifica¬ 
tions.  Outputs  from  low  level  process¬ 
ing  are  algorithms  and  parameter  selec¬ 
tion.  The  interface  is  regional  scene 
data  from  the  DASM,  ancillary  informa¬ 
tion  and  confidence  measures  of  label¬ 
ing.  This  information  will  feed  into 
feedback  product; cn-rules  which  trans¬ 
form  it  into  lcw-level  commands.  An 
example  would  be: 

o  IF  ROAD  is  found  THEN  SEARCH 
for  VEHICLES 

o  IF  GOAL  =»  SEARCH  for  VEHICLES 
THEN  SEARCH  for  CONVEX  objects 

o  IF  SEARCH  for  CONVEX  objects 
THEN  run  BACORE  (segmentation) 
at  WINDOW_X 

The  road  found  in  above  example, 
schema  is  accessed  from  the  DASH  and 
input  is  provided  to  calculate  the 
window  size. 

IV.  EXPERIMENTS  WITH  DASM 

The  Dynamic  Archival  Scene  Model, 
along  with  the  Static  Scene  Model,  is 
an  integral  element  of  knowledge  based 
vision  system  for  outdoor  scene  inter¬ 
pretation.  The  model  interface 

includes  the  discrete  vision  functional 
modules,  such  as  synthetic  ctereopsis, 
optical  (segment  free)  motion  detec¬ 
tion,  symbolic  motion  detection,  multi¬ 
object  tracking,  and  object  identifi¬ 
cation.  The  interface  also  includes 
low  and  intermediate  level  vision  oper¬ 
ators  such  as  edge  operators,  line 
operators,  texture  segmentors,  and 
scene  transform  operators.  Lastly,  the 
interface  also  includes  the 
knowledge-based  controller  of  the 


vision  system.  Thus  the  DASM  acts  as 
an  efficient  central  linx  among  the 
various  eiemer*'s  -t  the  vision  s\sttm. 

Incorporation  of  the  model  into  the 
vision  system  helps  to  resolve  conflict 
and  ambiguity  in  the  interpretation  oi 
scene  elements  (such  as  coject  iden¬ 
tity)  over  successive  temporal  image 
frames.  it  semantically  confirms  the 
scene  dynamics  interpretation  (s'ch  as 
object  motion)  by  archiving  d.  cs 
information  and  evaluating  it  wi  _)  eir.- 
poral  and  spatial  context.  Exp-'i’.ien- 
tal  results  show  consistent  an--  obu 
output  of  image  frame  sequence  maiy  ;i 
in  the  form  reliable  •bjer  i  euci-ica 
tior.,  minimal  object  sis;  V'.is;  t  „-nt 
motion  detection  and  ;  — r1’  si  tcw„  and 
temporal  pattern  de*  ctic  :.  Figure  8 
illustrates  the  re  sit;  o  ;er  a  time 
sequence  of  image  frames,  where  DASM 
information  helped  in  robustly  detect¬ 
ing  and  identifying  objects  of  inter¬ 
est,  estimating  the  dynamic  sensor  and 
object  motion  parameter,  . 'd  tracking 
the  objects  in  the  scene.  '  -o  vectors 
overlayed  over  the  objects  i'  Figure  8 
indicate  the  predicted  dire  .ion  and 
motion  of  objects  over  time. 

V.  CONCLUSION 

We  presented  an  approach  to  ir.te  ■ 
grated  modeling  of  the  Static  Scene 
Moael  and  dynam ic  image  abstraction 
model  in  the  rorm  of  Dynamic  Archiv;; 
Scene  Model.  The  integrated  n-'  • 
helps  facilitate  correspondence  o, 
image  abstracted  information  wjti  ex¬ 
pected  world  knowledge.  Exper;  ntal 
results  chow  reliable  :it  er  pr  eta  t  ion  of 
dynamic  scene  ti.e  preser-  ■»  of 

ambjguous  and  incomplete  lew  level 
operator  output  and  in  presence  of  both 
spatial  and  temporal  noise. 


*.  *,  i  *  W“*& 


LAST  SCENE  THE  FRAME  THE  NrH 
FRAME  BEFORE  LAST  FRAME 

FRAME  BEING  DATA _ DATA  OATA 

PROCESSED  | 


REGIONS 
IN  THE  SCENE 
FRAME 

consthj:tion  of  a  ■cstorical  scene  buffer 


Figure  6.  Historical  Scene  Buffer  in  DASH 


NIOXTN 

LAMlI 

trMSOuC  MAtywi  j 


PlBAINTI* 
1NCKLAMS 
iMOMFAT  lOBl 


MlC« 

llvll 

a»oci*»im; 


Figure  7.  iJASM :  Interface  Between  Low  Level,  Mid  Level, 
and  Hign  Level  Processes 


30 


REFERENCES 


T.O.  Binford  (1982),  Survey  of 
Model-Based  Image  Analysis  Systems,  The 
International  Journal  of  Robotics  Re¬ 
search,  Vol .  1,  No.  1,  pp.  18  8’* 

R.  Brooks,  R.  Cereiner,  and  T.  Binford 
(1979) ,  The  ACRONYM  Model-Based  Vision 
System,  IJCAI  Proceedings,  Tokyo,  p. 
105. 

J.A.  Feldman,  D.H.  Ballard,  and  C.M. 
Brown  (1978),  An  Approach  to  Knowledge- 
Directed  Image  Analysis,  in  Computer 
Vision  Systems,  ed.  A.  Hanson  and  e. 
Risemar,,  New  York:  Academic. 

M. A.  Fischler  a ;  d  R.A.  Eschlage: 
(1973),  The  Representation  and  Matching 
of  Pictorial  Patterns,  IEEE  Trans,  on 
Computers,  C-22,  January. 

A.M.  Hanson,  C.C.  Parma,  and  E.M. 
Riesman  (1980),  Experiments  in 
Schema -Driven  Interpretation  of  *  Natu¬ 
ral  Scene,  Coins  Tech.  Rept.  80-1B, 
Amherst,  Mass.:  Univ.  of 

Massachusetts . 

A.K,  Mactworth  and  W.S.  Havens  (1981), 
Structuring  Domain  Knowledge  for  Visual 
Perception,  IJCAI  Proceedings, 
Vancouver,  p.  625. 

T.  Matsumaya  and  V.°  Hwang  (1985)  , 
SIGMA’  A  Framework  fcr  Image  Under¬ 
standing,  IJCAI  Proceedings, 
California,  p.  908-915. 

M.L.  Minsky  (1975),  *A  Framework  for 
Representing  Knowledge",  in  the  “V- 
chology  of  Computer  Vision,  ed.  by  P.H. 
Winston,  McGraw-Hill. 

M.  Nagoa  and  T.  Matsuyama  (1980),  A 
Structural  Analysis  of  Complex  Aerial 
Imagery,  Plenum. 

A.  Rosenfeld  (1984),  Image  Analysis: 
Problems,  Progress  and  Prospects,  Pat¬ 
tern  Recognition,  Vol.  17,  No.  1,  PP- 
3-12. 


C19W 5V3 


IMAGF.  UNDERSTANDING  RESEARCH  AT  GENERAL  ELECTRIC 
J.  L.  Mundy1 


General  Electric 

Corporate  Research  and  Development 
Schenectady,  New  York 


INTRODUCTION 

The  current  research  at  General  Electric  in  image  under¬ 
standing  has  evolved  out  of  a  long  period  of  experience  in 
building  systems  for  automatic  visual  inspection  in  a  factory 
environment  [Mundy  and  Porter  1983],  Out  of  this  experi¬ 
ence  has  come  the  opinion  that  there  are  a  number  of  limita¬ 
tions  in  image  understanding  technology  that  prevent  its 
application  to  general  environments. 

The  orimary  weakness  in  the  technology  is  that  many 
assumptions  have  io  be  made  about  the  geometric  and  signal 
processing  constraints  associated  with  a  given  problem  in 
image  analysis  or  object  recognition.  The  basic  algorithms 
are  fragile  in  the  sense  that  if  the  environment  does  not 
agree  with  the  assumptions  within  a  small  margin,  then  the 
results  are  unpredictable  and  usually  unsatisfactory.  On  the 
other  hand,  if  the  constraints  can  be  satisfied,  then  it  is  pos¬ 
sible  to  perform  rather  sophisticated  recognition  tasks  with 
excellent  reliability  [Mundy  and  Joynson  19771.  A  system 
developed  to  inspect  smul  manufacturer  parts  was  able  to 
achieve  an  error  rate  of  less  thin  0  I'm  as  determined  over 
several  years  of  operation.  . 

In  1983,  a  new  program  was  initiated  to  study  ,'unda- 
mental  issues  in  image  understanding.  The  goal  of  the  pro¬ 
gram  is  to  explore  new  techniques  that  show  promise  in 
extending  robustness  and  flexibility  of  image  understanding 
systems.  This  program  has  two  main  technics1  thrusts.  The 
first  is  aimed  at  evolving  new  techniques  in  model-based 
vision.  The  second  area  of  interest  is  geometric  reasoning 
and  its  application  to  image  understanding.  Tnis  program 
emphasis  is  based  on  the  opinion  that  these  are  the  most 
important  research  issues  for  rapid  progress  in  image  under¬ 
standing.  The  paper  will  outline  our  ideas  for  a  model-based 
image  understanding  system  as  well  as  the  application  of 
geometric  reasoning  to  such  a  system. 

MODEL-BASED  VISION 

The  use  of  geometric  models  to  locate  and  recognize 
objects  in  scenes  has  f..,ved  surprisingly  effective  in  dealing 
with  r'uttered  scenes  with  unreliable  feature  extraction 
[Brooks  19811,  [Goad?  19S21,  [Ayache  19831.  iGrimson  and 
Lozono-Perrz  19851.  Tie  constraints  provided  by  the  model 

I.  The  work  leporied  here  las  involved  cooperative  efforts  within  ihe 
Logic  jnd  ’nference  Systems  Program  at  General  Klccinc  Corporate 
Re-iC.irch  and  Development.  Contributions  tf  this  work  have  been 
made  by  M  Barry.  C.  Connolly.  R.  Jaen:cke,  D  Kapur.  H  Ko.  D 
Mus-ier.  P  Narendran.  R  Sten«trom.  R  Si  Peters,  and  D 
Thompson 


are  a  very  effective  filter  that  can  eliminate  many  incorrect 
'nterpretations.  T  he  match  does  not  have  to  be  complete 
since  the  probability  of  more  than  a  few  features  matching  is 
quite  low,  unless  the  assignment  is  actually  correct. 

The  concept  of  model-based  vision  and  the  role  of 
geometric  reasoning  are  illustrated  in  Figure  1.  This  system 
block  diagram  indicates  the  major  functions  in  a  system  that 
uses  geometric  modeis  to  identify  and  locate  objects.  First, 
we  describe  the  basic  aspects  of  this  model-based  vision  sys¬ 
tem  that  will  guide  our  development  over  the  next  few  years. 


Figure  I.  The  ovei.l!  system  block  diagram.  This  diagram  indicates 
the  importa.n  functions  in  a  model-based  image  under¬ 
standing  system. 


The  system  is  divided  into  two  major  sections,  an  on-line 
system  and  an  off-line  system.  The  on-line  partition  has  the 
basic  function  of  recognizing  objects  by  matching  a  relational 
model  of  the  object  with  a  symbolic  representation  of  the 
scene.  In  the  oroposed  efTort,  both  the  scene  description  and 
model  will  be  primarily  geometric  descriptions  with  a  small 
set  of  ancillary  properties  such  as  surface  texture  and 
reflectance.  The  off-line  portion  is  responsible  for  the  forma¬ 
tion  of  models  and  model  libraries  as  well  as  the  develop¬ 
ment  of  efficient  matching  rules  and  strategies  that  are 
needed  to  support  the  on-line  activities. 

The  current  hardware  configuration  is  shown  in  Figuu.  2. 
We  are  using  the  Symbolics  3600  Lisp  Machine  with  frame 
grabber  and  color  display.  We  have  built  our  image¬ 
understanding  software  on  top  of  IMAGE-CALC  which  is  an 
excellent  development  environment  for  image  processing 
developed  by  Lynn  Quam  at  SRI  [Quam  19841. 


e 


« 


* 


Figure  2.  The  hardware  configuration  for  the  image  understanding 

System. 


SYMBOLIC  SCENE  REPRESENTATION 

The  symbolic  scene  representation  is  obtained  from  either 
a  visual  image  sensor  or  a  3-D  range  sensor.  A  low  level 
boundary  description  is  obtained  from  each  sensor  and  seg¬ 
mented  into  a  3-D  solid  surface  representation.  This  process 
involves  standard  algorithms  for  edge  detection  and  linear 
boundary  segmentation.  In  the  case  of  multiple  intensity 
views,  stereo  matching  is  used  to  extract  r  direct  3-D 
description.  This  information  is  augmented  by  direct  range 
data  obtained  f.om  the  range  sensor.  It  is  planned  to  experi¬ 
ment  with  various  combinations  of  sensor  information, 
including  the  s’andard  case  of  a  single  intensity  image.  The 
final  symbolic  scene  representation  consists  of  line  segments 
that  are  interpreted  as  intensity  or  range  boundaries  and 
regions,  which  are  connected  sets  of  uniform  texture  or  sur¬ 
face  elements. 

MODEL  MATCHING 

The  second  main  process  in  the  on-line  system  is  the 
retrieval  and  matching  of  an  object  model  with  the  symbolic 
scene  description  just  described.  In  this  system,  the  match¬ 
ing  process  is  considered  to  be  mainly  an  identification  of  the 
perspective  coordinate  transformation  that  maps  a  three- 
dimensional  solid  surface  model  into  a  2-D  scene  description. 
The  detei-Mnation  of  the  transformation  is  carried  out  by- 
matching  elements  of  the  scene  with  elements  of  the  model. 
In  the  simplest  case,  these  elements  would  be  line  segments 
that  are  matched  according  to  orientation  and  position. 

2-D  Model  Matching 

We  have  completed  the  development  of  a  2-D  model 
matching  system  which  acquires  models  from  the  2-D  scene 
itself.  The  techniques  we  have  employed  are  quite  similar  to 
other  2-D  matching  systems  jCri’.nson  and  Lczono-Perez 
198*11.  One  distinguishing  aspect  of  our  experiments  has  been 
the  use  of  color  information. 

The  matcher  is  based  on  straight  line  segments  that  are 
grown  from  intensity  edges  found  by  the  Canny  edge  detec¬ 
tor  (Canny  1983].  The  line  segments  are  determined  by  a 
combination  of  curvature  extrema  [Asada  and  Brady  '  384] 
and  a  deviation  tolerance  iTomek  i 9741. 

The  color  information  is  introduced  after  the  object 
boundary  segments  have  been  determined.  The  color  image 
is  transformed  into  intensity,  hue,  and  saturation  com¬ 
ponents.  The  image  segmentation  is  based  on  intensity; 


however,  both  sides  of  the  line  segment  are  labeled  accord¬ 
ing  to  the  mean  and  standard  deviation  of  hue  and  satura¬ 
tion. 

In  this  manner  the  “side"  of  a  line  can  t<  matched  to  a 
“side”  in  the  moc'el  with  a  similar  color.  Those  regions  with 
low  saturation  are  not  filtered  by  color,  since  hue  is  not  a 
reliable  feature  in  that  cast.  Also,  if  a  hue  data  has  a  high 
standard  deviation,  the  hue  value  is  not  considered  to  be 
very  significant. 

It  is  not  straightforward  to  determine  a  distat.ee  metric  to 
measure  the  agreement  of  the  hue  and  seometric  properties 
of  a  line  segment.  In  the  current  program,  the  line  segments 
are  ranked  by  sorting  the  segments  separately  on  each  meas¬ 
urement  value  (in  the  current  case  length  and  hue).  The 
sorted  list  is  scanned  until  scene  segments  are  found  that  are 
as  close  as  possible  to  a  given  model  segment  within  a  given 
tolerance  band.  This  ranking  process  serves  tc  eliminate 
irrelevant  matches  of  the  model  segments  into  the  s-ene. 

The  proposed  assignments  from  the  scene  into  the  model 
are  used  to  define  clusters  in  the  space  of  model-to-scene 
transformations  [Ballard  1931].  The  transformation  space  for 

2- D  matching  has  three  dimensions,  corresponding  to  one 
rotational  and  two  translational  degrees  of  freedom.  In  the 
curtent  algorithm,  rotation  is  handled  separately  from  trans¬ 
lation.  The  cluster  in  transformation  space,  which  is  the 
most  compact  and  contains  the  most  matches,  is  used  to 
determine  a  mean  transformation  vector 

The  match  determined  by  this  mean  transformation  is 
checked  for  satisfactory  agreement  between  the  scene  seg¬ 
ments  and  the  model  segment.  In  this  case,  agreement  is 
based  on  the  number  of  matched  segments  and  the  normal 
distance  error  between  the  model  and  the  scene  segment 
positions.  If  the  agreement  is  not  satisfactory,  the  next  most 
attractive  segment  is  selected  and  tested. 

It  is  emphasized  that  the  association  of  model  segments 
and  scene  segments  is  not  implemented  as  a  tree  search.  A 
group  of  features  are  associated  with  each  model  segment 
and  are  then  clustered  in  transformation  space.  Earlier 
experiments  with  a  tree  search  algorithm  showed  that  it  is 
difficult  to  specify  a  sequential  priority  ordering  on  the  quality 
of  feature  matches. 

A  sample  of  2-D  matching  is  illustrated  in  a  series  of 
figures.  Figure  3  shows  an  intensity  image  of  a  typical  scene 
usc-1  in  these  experiments.  The  final  result  of  processing  the 
see  ,e  into  line  segments  is  given  in  Figure  4  and  is  superim¬ 
posed  on  the  intensity  data  in  Figure  5.  The  assignment  of 
model-to-scene  segments  at  an  intermediate  stage  of  the 
match  is  shown  in  Figure  6.  The  final  match  is  shown  in 
Figure  7.  This  matching  technique  is  able  to  find  objects  in 
cluttered  scenes  where  many  of  the  object  features  are  miss¬ 
ing  or  fragmented. 

3- D  Matching 

A  new  experiment  is  currently  under  way  to  evolve  to 
three-dimensic.ial  polyhedral  models  and  to  match  into  two- 
dimensional  perspective  intensity  or  color  images.  At  this 
time,  the  use  of  both  line  segments  and  vertices  as  features 
is  being  explored.  At  present  the  3-D  models  are  created  by 
CAD  techniques.  Figure  8  shows  the  assignment  of  model 
vertices  into  2-D  scene  vertices.  The  scene  vertices  are 
obtained  by  extending  scene  segments  until  they  intersect. 


84 


O 


Figure  3.  Image  intensity  for  a  typical  scene  used  in  the  2-D  match¬ 
ing  experiments. 


‘  "  ’  *- 

v.*r  v  •  —  .  •  ■ 

’  .  '«t 
•  •  .« 

y^vv.-  v,  %  \  ‘  >  . 

'  '  t '  •  : 1  • 

• 

.  ‘  '  ■ . . 

V 

-as 

•*  .  > 

•t.  *•■*'  .  •.  -  %  ,  ;.'j<  {_. 

'  l 

Figure  4  The  edge  segments  produced  by  applying  the  Canny  edge 
detector  and  a  linear  segmentation  algorithm  to  Figure  3. 


Figure  5.  The  line  segments  superimposed  cn  the  intensity  da'a  of 
Figure  3. 


Figure  6  An  intermediate  match  between  the  model  ana  the  scene. 


Figure  7.  The  final  match  position.  The  model  is  now  within  toler¬ 
ance  limits  to  the  line  positions  in  the  scene. 


A  ■  ■ 


Figure  8.  A  3-D  matching  experiment  based  on  vertex  assignments. 

The  match  is  shown  in  an  intermediate  stage  of  proposed 
assignmen's. 


85 


OBJECT  MODEL  LIBRARY 

It  is  expected  that  a  relatively  large  number  of  models  will 
be  required  to  analyze  the  types  of  environ  ients  encoun¬ 
tered  in  the  autonomous  vehicle  a-iplication.  For  example, 
road  obstacles  could  arise  from  a  large  variety  of  objects  both 
natural  and  man-made.  The  need  for  a  library  of  models 
raises  the  question  of  how  to  organize  and  index  the  models 
so  that  the  model  matching  phase  can  be  carried  out 
efficiently. 

The  development  of  a  model  taxonomy  and  indexing 
scheme  is  another  interesting  application  of  the  technology 
we  are  developing  for  geometric  reasoning.  We  propose  that 
a  hierarchy  of  geometric  concepts  serves  as  a  useful  starting 
point  to  develop  model  characteristics  that  are  effective 
indices  into  the  library.  For  example,  the  distinction  between 
convex  and  non-convex  can  be  determined  from  an  analysis 
of  the  scene  on  the  basis  of  range-jump  boundaries  and 
occlusion  edges. 

Other  geometric  properties  such  as  colinearity,  symmetry, 
and  genus  should  also  be  useful  in  classifying  the  models. 
The  invariance  of  these  properties  to  perspective  transforma¬ 
tion  must  also  be  determined  in  order  to  evaluate  their 
effectiveness  for  model  indexing. 

OBJECT  DESCRIPTION  AND  REPORTING 

After  the  successful  match  of  a  model  into  the  scene  it  is 
possible  to  answer  queries  about  the  scene  and  to  report 
information  needed  for  navigation  and  planning.  It  is 
expected  that  the  seer  *  data  provides  at  best  a  fragmentary 
description  that  must  be  augmented  by  information  derived 
from  the  model.  The  simplest  example  i*  the  use  of  3-D 
information  from  the  model  to  describe  petions  of  the 
object  that  are  not  visi  le.  Such  information  is  useful  in 
planning  vehicle  paths  around  road  obstacles. 

The  matching  of  landmarks  in  the  scene  allows  reference 
to  maps  and  terrain  data,  which  is  useful  for  navigation.  It  is 
likely  that  the  matching  process  here  is  much  easier  than  for 
obstacles  and  tactical  vehicles.  The  process  is  characterized  by 
tracking  a  known  object  under  a  small  range  of  possible 
transformations.  Once  lhe  landmark  has  been  located,  a  sim¬ 
ple  predictive  model  for  vehicle  motion  can  update  the  per¬ 
spective  transformation. 

TKE  MODEL  FORMATION  PROCESS 

The  main  activities  to  be  carried  cal  in  the  off-line  por¬ 
tion  of  the  system  are  mode!  construction  and  geometric  rea¬ 
soning.  The  goal  of  these  processes  is  to  create  models  and 
mc'.ching  rules  to  support  efficient  on-line  object  recognition. 

There  are  two  main  approaches  to  the  formation  of  ’ 
models  under  investigation  at  present.  The  first  involves  a 
manual  process  where  a  wire-frame  or  surface  model  is 
created  using  a  CAD  solid  modeling  system.  This  model  is 
based  on  a  priori  information  such  as  a  set  of  mechanical 
drawings  for  the  object,  or  is  partially  based  on  measure¬ 
ments  taken  from  images  or  from  the  object  itself.  In  our 
current  implementation,  we  are  using  the  Symbolics  S- 
Geometiy  package  as  a  means  for  creating  3-D  models.  An 
example  of  model  editing  is  shown  in  Figure  9. 

The  second  approach  is  based  on  direct  learning  from  2-D 


Figure  9.  A  3-D  model  being  prepared  by  the  Symbolics  S- 
Geometry  solid  modeling  system. 


or  3-D  images  of  the  object  [Kanade  1983],  [Faugeras  1934], 
In  this  approach  a  3-D  model  of  the  object  is  obtained  by 
stereo  matching  and  direct  3-D  ranging.  The  description  of 
the  object  is  either  a  set  of  connected  surfaces  or  a  3-D 
volume  model.  This  segmentation  is  derived  from  the  infor¬ 
mation  m  the  scene.  In  the  2-D  matcher  described  earlier, 
the  model  is  simply  selected  line  segments  taken  from  an  iso¬ 
lated  view  of  the  obiec,. 

Also  under  way  are  efforts  to  obtain  3-D  object  models 
from  sensor  data.  The  current  experiments  ire  directed  at 
the  extraction  of  a  wire  freme  of  an  object  by  extracting 
curves  of  high-surface  curvature  in  range  data.  These  wire 
frames  are  then  interpreted  as  solids  by  computing  2-cycle 
close -es  of  space  {Wesley  and  Markovsky  1982].  The  infor¬ 
mation  from  several  views  can  be  combined  by  forming 
boolean  intersections  of  the  solid  figures  taken  from  each 
view. 

THE  ROLE  OF  GEOMETRIC  REASONING 

There  are  two  central  issues  that  arise  from  the  pro¬ 
cedures  just  described:  1)  Specification  of  rules  for  matching 
object  features  across  mu  tiple  perspective  views;  2)  Segmen¬ 
tation  of  object  surface  and  volume  into  a  compact  and 
effective  object  representation.  We  propose  that  geometric 
reasoning  using  powerful  algebraic  deduction  methods,  can 
significantly  advance  the  state  of  the  art  in  these  two  areas. 

The  formation  of  rules  for  matching  features  in  stereo 
pairs  or  larger  sets  of  multiple  views  is  currently  based  on  a 
heuristic  analysis  of  the  properties  of  the  perspective  image 
transformation  and  assumed  properties  of  the  objects  in  the 
scene  such  as  polyhedra  (Shafer  et  al.  1982]. 

An  Algebraic  Approach  to  Geometric  Reasoning 

We  have  been  studying  the  application  of  a  new  method 
in  geometric  reasoning  based  on  algebraic  manipulation  (Wu 
1978].  The  reasoning  proceeds  by  manipulating  a  set  of 
hypotheses  that  are  represented  as  polynomials  and  attempts 
to  establish  the  validity  of  a  conclusion  polynomial.  The  con¬ 
clusion  is  shown  to  follow  from  the  hypotheses  if  the  conclu¬ 
sion  can  be  expressed  as  a  linear  expansion  in  terms  of  the 
hypotheses. 


86 


A  proof  is  established  by  dividing  the  conclusion  polyno¬ 
mial  by  each  of  the  hypotheses  in  turn.  The  remainder  after 
the  first  division  is  used  as  the  polynomial  to  be  divided  by 
the  next  hypothesis  and  so  on.  After  all  the  hypotheses  have 
been  applied,  the  final  remainder  is  checked.  If  the 
remainder  is  identically  zero,  then  the  conclusion  is  known,  to 
follow  from  the  hypotheses. 

If  the  final  remainder  is  not  zero,  then  it  is  not  possible 
to  conclude  that  the  theorem  is  necessarily  Lise.  However, 
some  very  useful  insights  can  be  gained  by  determining  addi¬ 
tional  conditions  under  which  the  final  remainder  does  van¬ 
ish.  These  conditions  can  be  considered  as  addition,! 
hypotheses  necessary  to  oake  the  condus’on  valid  Such 
conditions  can  provide  useful  insight  into  the  problem  under 
consideration.  In  many  cases,  the  additional  conditions 
correspond  to  unusual  special  cases  that  should  riot  be  over¬ 
looked  in  developing  programs  that  use  the  geometric  pro¬ 
perties  described  in  the  theorem. 

The  procedure  jus>  described  assumes  that  the  hypotheses 
are  in  triangular  form  so  that  the  variables  are  introduced 
one  at  a  time  in  successive  hypothesis  polynomials.  In  sym¬ 
bolic  form,  if  the  variables  are  l.c,,*,.  •  •  ■  .x.1  ,  then  the 
hypotheses  should  appear  as 

/.’  ( a, ) 

h(.\,.x2.  •  •  •  \) 

INTERPRETING  THE  REMAINDER  -  AN  EXAMPLE 
FROM  VISION 

An  important  aspect  ol  the  algebraic  approach  to  theorem 
proving  in  geometry  is  that  it  is  often  possible  to  derive  new 
conditions  or  constraints  that  were  not  obvious  in  the  initial 
formulation  of  the  problem.  A  mechanism  for  discovering 
new  constraints  is  provided  by  examining  both  the  degen¬ 
erate  conditions  and  the  terms  involved  in  any  remainder  If 
a  geometric  interpretation  can  be  made  that  causes  the 
remainder  to  vanish,  then  this  condition  can  be  considered  to 
be  an  additional  hypothesis  that  is  necessary  for  the  theorem 
to  be  valid. 

Likewise,  the  requirement  that  a  degeneracy  condition 
should  never  be  zero,  often  imposes  a  new  hypothesis  for 
the  validity  of  the  theorem  that  was  not  understood  in  the 
original  formu'ation.  An  example  of  this  last  case  was 
observed  in  ihe  parallel  lines  problem.  In  that  example,  it 
was  implicitly  assumed  that  points  b  and  c  were  distinct. 
However  such  a  condition  must  be  an  explicit  hypothesis  in 
order  for  the  theorem  to  rigorously  hold.  These  types  of 
conditions  often  cause  trouble  in  implementing  programs  to 
carry  out  geometric  operations  and  relations.  The  process  of 
discovering  them  in  ait  examination  of  the  proof  of  related 
theorems  may  be  more  efficient  than  exhaustive  testing  of 
the  program. 

To  show  the  value  of  placing  a  geometric  interpretation 
on  the  remainder,  consider  the  problem  illustrated  in  (  cure 
10.  We  are  considering  the  standard  arrangement  of  perspec¬ 
tive  viewing.  The  viewplane  lies  in  the  x-z  plane  and  the 
viewing  direction  is  along  the  positive  /  axis.  The 
hypotheses  are  derived  from  the  standard  equations  of  per¬ 
spective  as  well  as  the  existence  of  two  lines  in  the  viewplane 


Figure  10  The  perspective  viewing  of  two  lines  in  space  T..e  hor¬ 
izon  and  vanishing  points  are  shown  in  the  box. 


for  which  the  vanishing  point  of  each  line  has  been  diver..  In 
homogeneous  coordinates,  the  vanishing  Doint  Ccordinatf 
vector  of  a  line,  Vo  the  viewplane  can  be  related  to  its 
direction  vector.  W .  pace,  the  hypotheses  are: 

ht\  ‘Vu  V'p  I,  -  i  U]  ,  -0 

/!;'  H"|,  t  />!..-  /  0 

t'jl  Wj,  V’p,;  —  /  “  0 

ii,:  It  ,,  Ip,.  -  /  W,.-0 

These  equations  express  the  location  of  each  component  of 
the  vanishing  poims  in  terms  of  the  direction  of  the  lines, 
W,  and  The  parameter,  /',  is  the  distance  of  the  eyepoint 
from  the  viewplane.  This  distance  is  approximately  equal  to 
the  lens  focal  length  in  a  simple  optical  imaging  system. 

These  hypotheses  do  not  constrain  the  direction  of  the 
lines  in  space,  but  merely  the  dependence  of  the  vanishing 
point  locations  on  the  line  directions.  Let  us  try  to  prove 
that  the  lines  are,  in  fact,  perpendicular  in  space.  We  have 
no  reason  to  suppose  that  this  theorem  is  valid,  so  we  should 
expect  to  still  have  a  remainder  after  dividing  the  conclusion 
by  the  hypotheses.  The  conclus-on  is 

y  It,,  It  .  +  it,,  it,,  +  w | .  tt,.-0 

7 he  dependent  variables  set  is  I  **';...  It,, .  it',,  t  ,  ]  With 
this  choice,  the  hypotheses  are  already  in  triangular  form. 
After  synthetic  division  we  obtain  an  expansion  for  g. 

-I'X-  t  it  'i  +  /  1*1, "1+  •/>}.-  **’,.  +  •/>,.  —A 

The  remainder  R.  after  factoring  is 
R  -  n  i,  it  ,,  (/•’-*■  >/>,,  ip,,  l p,:  tp:.| 

The  degenerate  condition  requires  that 
/•'*  0 

This  condition  is  an  actual  geometric  degeneracy  that 
corresponds  to  the  viewpoint  lying  within  the  viewplane.  In 
this  case,  the  perspective  transformation  is  ill-defined  since 
all  of  space  collapses  into  the  viewpoint 

Upon  inspecting  the  remainder,  several  observations  can 
be  made.  First,  the  remainder  will  vanish  if  either  or  W>, 
are  zero.  These  cases  correspond  to  one  or  both  of  the  lines 
being  parallel  to  the  viewplane.  In  such  a  case,  the  vanishing 
point  of  the  line  will  be  at  infinity  in  the  viewplane,  and  the 
perspective  transrormafion  equations  become  indeterminate. 
This  case  should  be  raled  out  uince  it  violates  the  assump¬ 
tions  underlying  the  hypotheses. 


87 


The  second  observation  is  related  to  the  second  factor  in 
the  tcmainder.  In  vector  notation  this  can  be  rewritten  as, 
f2  bVpl»Vp2-0 

We  have  discovered  a  new-  hypothesis  necessary  to  make  the 
original  theorem  valid.2  In  ordet  for  two  lines  in  a  perspec¬ 
tive  image  to  correspond  to  perpendicular  lines  in  space,  this 
additional  relationship  between  'he  vanishing  points  and  the 
viewpoint  distance  must  hold.  This  relation  seems  a  useful 
■  constraint  since  it  depends  only  on  the  focal  length,  /,  of  the 
image  formation  system,  and  the  direction  of  the  lines  in  the 
image  plane.  Several  pairs  of  lines  which  satisfy  the  con¬ 
straint  for  /-  1  are  shown  in  Figure  11. 

Current  efforts  are  underway  to  provide  a  conceptual 
framework  of  geometric  knowledge  which  can  help  direct  and 
interpret  tne  results  of  algebraic  proof  [Kapur  et  al.  1 985] . 
The  concepts  are  small  groups  of  formal  axioms  that  are  part 
of  standard  geometry  ,heory.  The  current  issues  unuer 
investigation  are 

•  Representation  and  control  fo;  proof  sequences. 

•  Inheritance  mechanisms  within  the  conceptual  network. 

•  Techniques  for  i.'ibedding  new  theorems  imo  existing 
concepts. 

•  Semi-automatic  use  of  giaphics  to  illustrate  the  theories. 

•  The  integration  of  syntactic  and  logical  operations 

The  final  goal  is  to  create  a  reasoning  tool  that  car  work 
cooperatively  in  the  development  of  image  understanding 
algorithms.  We  expect  that  the  intial  benef.ls  will  be  the 
proof  of  T'lrrect.ness  and  an  understanding  of  the  limitations 
of  the  geometric  assumptions  and  algorithms  associated  with 
-rspective  matening. 

References 

Asada,  H.  and  Brady,  J  M.,  “The  Curvature  Primal  Sketch,” 
Proc.  Workshop  on  Computer  Visio  i:  Representation  and  Con¬ 
trol,  1 984,  p.  8. 

Ayache,  N„  ‘‘A  Model-Based  Vision  System  to  Identify  and 
Locale  Partially  Visible  Parts,”  Proc.  CVrR.  Washington, 
1983. 

Ballard,  D.H.,  “Generalizing  the  Hough  Transform  to  Detect 
Arbitray  Shapes,"  Pattern  Recognition  13.  1981,  p.  111. 

Brocks,  R.A.,  "Symbolic  Reasoning  Among  3D  Models  and 
2D  Images,”  Artificial  Intelligence  1.7,  1981,  p.  285 

Canny,  J.,  “Finding  Lines  and  Edges,"  Report  No.  AI-TR- 
720.  MIT  Artificial  Intelligence  Laboratory,  1983. 

Faugeras,  O.D..  "New  Steps  Toward  a  Flexible  3-D  Vision 
System  for  Robotics.”  Proc.  7th  International  Joint  Con/,  on 
Pattern  Recognition.  1984,  p.  796. 


2.  Taken  Kanade  indicated  i hat  ifm  result  has  been  published  earlier. 
(Shafer  et  al.  19831.  The  result  was  initially  unknown  io  the  au  nor  and 
derived  independently  in  the  manner  described. 


Figure  1 1.  Ti.ree  pairs  of  lines  with  vanishing  points  that  satisfy  the 
new  constraint. 


Goad  C.,  “Special  Purpose  Automctic  Programming  for 
Model  Based  Vision,”  Stanford  University  Report, 
ADP001 198. 

Grimson,  W.E.L.  and  Lozano-Perez,  T.,  “Search  and  Sens¬ 
ing  Strategies  for  Recognition  and  Localization  of  Two  and 
Three  Dimensional  Objects,”  Proc.  3rd  International  Sympo¬ 
sium  on  Robotics  Research,  1985. 

Hermann,  M„  Kanade,  T.,  and  Kuroe,  S.,  “Incremental 
Acquisition  of  a  Three  Dimensional  Scene  Model  From 
Images,”  IEEE  PAMI-b.  1984,  p.  331. 

Kapur,  D.,  Mundy,  J.,  Musser,  D.,  Narendran,  P.,  “Reason¬ 
ing  About  3D  Space,”  Proc.  IEEE  Ccnf  Robotics  and  Automa¬ 
tion,  1985,  p.  405. 

Markovsky,  G.  and  Wesley,  M.A.,  “Fleshing  Out  Wire 
Frames,”  IBM  J.  Res.  Dev.  24.  1980.  p.  582. 

Mundy.  J.L.  and  Joynson,  R.E.,  “Automatic  Visual 
Inspe.tinn  Using  Syntactic  Analysis,"  Proc.  IEEE  Conf.  Pat¬ 
tern  Recognition  and  Image  Processing.  1977,  p.  144. 

Shafer,  S.,  Kanade,  T.,  Ker.der,  J.,  “Gradient  Space  Under 
Orthography  and  Perspective,”  Computer  Vision,  Graphics  and 
Image  Process ii.g  24.  1983,  p.  182. 

Tomek,  I.,  “Piecewise  Linear  Approximation,”  IEEE  Tran¬ 
sactions  on  Computers  C-23.  1974,  p.  445. 

Porter,  G.B  and  Mundy,  J.L.  ,  "A  Model  Driven  Visual 
Inspection  System.”  First  International  Symposium  on  Robotics 
Research,  MIT  Press,  J.M.  Brady  and  R.P.  Paul  ed.,  1983. 

Quant,  L.,  “Image-Calc  User's  Manual,"  SRI.  1984. 

Wu  Wen-tsun,  “On  the  Decision  Problem  and  Mechaniza- 
lion  of  Theorem  Proving  in  Elementary  Geometry,”  Scientia 
Smica  21,  1978.  pp.  159-172. 


88 


CCQ^ONQ-N 


STRUCTURE  AND  MOTION  FROM  IMAGES 

J.K.  Aggarwal 
Amar  Mitiche 

Computer  and  Vision  Research  Center 
The  University  of  Texas  at  Austin 
Austin,  TX  lh  i2 

2.  OVERVIEW 


ABSTRACT 

This  paper  reviews  the  research  on  structure  and 
motion  from  images  performed  at  the  Computer  and  Vision 
Research  Center  01'  The  University  of  Texas  at  Austin. 
Early  work  is  briefly  reviewed  and  the  more  recent  work  is 
described  in  greater  detail.  The  recent  work  has  focussed  on 
developing  methods  which  exploit  explicitly  the  rigidity  of 
objects  in  motion.  Two  methods  are  presented  here.  The 
first  method  relies  on  the  observation  of  five  points  in  two 
images,  and  uses  the  fact  that  distances  between  points  of  a 
rigid  object  do  not  charge  as  a  result  of  motion.  The  second 
method  depends  on  the  observation  of  tour  lines  in  three 
views  and  uses  the  fact  that  angles  between  the  lines  in  a 
rigid  configuration  of  lines  do  not  change  as  a  result  of 
motion.  In  both  cases,  the  transformation  between  views  are 
computed  ..id  experimental  results  are  presented. 

J.  INTRODUCTION 

The  perception  and  interpretation  of  motion  of  objects 
in  space  are  among  the  fundamental  capabilities  of  the 
human  visual  system.  These  capabilities  have  been  and  con¬ 
tinue  to  be  extensively  investigated  from  the  psychological 
and  psychophysical  viewpoints.  More  recently,  computer 
vision  has  been  concerned  with  the  processing  of  sequences 
or  collections  of  images  wiji  the  objective  of  collecting 
information  from  the  set  as  a  whole  that  may  not  be  obtained 
from  any  one  image  by  itself.  The  study  of  cloud  motion 
[1,2]  may  have  been  an  early  motivation  that  gave  significant 
impetus  to  the  analysis  of  motion  by  computers.  However, 
at  present  time  there  are  a  multitude  of  applications  driving 
the  computer  vision  research  into  various  facets  of  motion  of 
objects  in  space.  The  application  areas  include  medicine, 
autonomous  navigation,  tomography,  communications  and 
television,  dancing  and  choreography,  meteorology,  robotics, 
animation  and  so  on.  Last,  and  certainly  not  the  least, 
results  in  machine  vision  may  contribute  to  the  understand¬ 
ing  of  the  functions  of  the  human  and  animal  visual  systems. 
The  broad  interest  is  evidenced  by  the  numerous  workshops, 
special  issues  and  conferences  [3-16]  almost  exclusively 
devoted  to  analysis  of  time-varying  imagery. 

The  present  paper  describes  work  done  at  the  Computer 
and  Vision  Research  Center  of  The  University  of  Texas  at 
Austin  on  the  subject  of  motion.  Ear'y  work  is  briefly  sur¬ 
veyed  and  the  more  recent  work  is  described  in  greater 
detail. 

This  research  was  supported  in  part  by  the  Air  Force  Office 
of  Scientific  Research  unaer  Contract  No.  F49620-85-K- 

0007. 


The  study  of  motion  by  computer  at  the  Computer  and 
Vision  Research  Center  has  started  with  the  work  of 
Aggarwal  and  Duda  [17]  who  proposed  a  mathematical 
model  of  cloud  motion  to  use  for  the  detection  and  measure¬ 
ment  of  this  motion.  The  model  consisted  of  planar  rigid 
polygons.  The  motion  of  each  polygon  occurred  in  one  of 
several  parallel  planes,  and  involved  both  translation  and 
rotation  as  well,  and  could  be  correlated  or  independent 
The  objective  of  die  study  was  to  use  a  time-sequence  of 
images  of  the  polygons  to  compute  their  motions  and  to  seg¬ 
ment  the  scene  into  its  component  polygons  The  number  of 
planes  and  polygon!  were  unknowns  of  tl-  problem.  The 
restriction  of  the  figure  shapes  to  polygons  has  been  relaxed 
in  a  later  studies  in  [18,19]  to  include  curvilinear  objects. 
An  approach  based  on  partial  matching  of  contours  in  suc¬ 
cessive  frames  was  developed  to  characterize  motion  xnd 
recognition  of  objects  in  successive  frames. 

The  use  of  motion  as  a  cue  to  image  segmentation  has 
been  considered  in  [20]  and  [21],  A  method  is  described  in 
[20]  where  the  images  of  moving  objects  are  extracted  by 
'subtracting"  consecutive  frames  in  a  time-ordered  sequence 
of  frames  and  then  focussing  attention  around  'non-zero' 
areas  of  the  "difference"  frame;  these  non- zero  areas  indi¬ 
cated  the  presence  of  a  moving  object.  An  estimate/refine 
scheme  was  finally  applied  to  extract  the  desired  image 
regions  (segmentation  on  the  basis  on  motion).  The  study 
described  in  [21]  used  the  idea  of  frame  differencing  coupled 
with  a  region  growing  operation,  which  started  at  the  non¬ 
zero  regions  of  ‘he  difference  frame,  to  extract  the  image  of 
moving  objects. 

All  of  the  above  mentioned  studies  are  concerned  with 
two-dimensionai  analysis  of  motion.  Although  the  actual 
motions  may  be  iJcing  plain  in  three-dimensional  space, 
there  was  no  attempt  to  recover  or  interpret  them.  The 
recovery  and  analysis  of  three-dimensional  motion  from 
two-dimensional  images  was  considered  by  Roach  and 
Aggi_waJ  in  [22,23].  Research  described  in  [22]  was  con¬ 
cerned  with  the  problem  of  determini.ig  the  motion  in  space 
of  planar-faced  three-dimensional  objects  (blocks)  from  a 
sequence  of  (two-dimensional)  images  of  these  objects.  The 
description  of  motions  was  qualitative  and  expressed  as 
toward  or  away  from  the  viewing  system  as  well  as  left/right 
and  up/down  in  the  image  plane.  The  strategy  to  recover 
such  a  description  was  to  locate  objects  in  the  image  and 
match  them  between  frames.  The  image  displacement  of 
objects  were  computed  from  the  centroid  of  their  visible 
parts.  Objects  were  segmented  out  by  putting  together 
detected  faces  using  a  heuristic  scheme  based  on  occlusion 
clues.  Matching  was  multilevel,  starting  with  the  r  c  of 
objects  veloo'  -s  and  ending  with  local  feature  matching. 


In  1231  a  quantitative  evaluation  of  motion  was  sought 
At  approach  was  developed  by  which  the  posmon  and 
motionof  two  or  more  points  in  space  could  be  recov*™1 
from  their  observation  in  two  distinct  views  Camera  imag¬ 
ing  was  modeled  by  central  projection.  Equations  we.e 
derived  from  the  projective  relations  which 
position  of  points  in  space  in  terms  of  the  posmon  of 
Ejection  in  the  image  plane.  The  unknowns  were' tte Po¬ 
tion  of  points  in  space  with  respect  to  onr  ofth-'iewmg 
systems,  as  well  as  the  parameters  of  die  relative  displace¬ 
ment  between  these  viewing  systems.  With  six  points, .the 
proposed  formulation  yielded  eighteen  nondmear^u.nons 
^eighteen  unknowns  when  scale  was  fixed 
observation  in  [23]  was  that,  unless  a  considerable  number 
of  points  was  used,  the  formulation  was  highly  unstable 
numerically. 

To  overcome  the  problems  associated  with  large  sys¬ 
tems  of  equations,  a  meth  jd  is  proposed  in  [24]  which  uses 
the  usual  projective  relatio  ns  and  exploits  a  fiindamenud  pro¬ 
perty  of  rigid  objects;  this  property  states  that  the  distances 
between  points  of  a  rigid  object  do  not  change  as  a  result  of 
motion.  The  explicit  use  of  this  property  leads  to  smaller 
system  of  equations  wnich  involve  only  the  posmon  of 
points  in  space  and  not  the  parameters  of  motion.  But.  once 
£  position  of  points  in  space  is  known,  recovering  the 
parameters  of  motion  is  a  simple  task.  The  approacn,  which 
relies  on  the  observation  of  five  points  in  two  views, 
shown  to  be  numerically  stable  and  well  behaved. 

Still  using  rigidity  explicitly,  an  approach  was 
developed  in  [25]  which  used  the  observation  of  lines  instead 
of  points.  Rigidity  was  exploited  by  stanng  explicitly  that 
the  angular  configuration  between  the  hn»  of  a  r.g.dstnic- 
ture  of  lines  does  not  change  as  a  result  of  motion.  The  for • 
mulation  used  projective  relations  based  on  the  observation 
of  four  lines  in  three  views. 

Tlie  study  described  in  [26]  was  concerned  with  the 
interpretation  of  motion  for  jointed  objects  in  addmon  to 
rigid  objects.  The  main  assumption  was  that  monons  are 
screws  /translation  +  rotation)  wlere  the  direction  of  the 
rotational  axis  remains  fixed  over  short  periods  of  ame. 
This  assumption  was  a  generalization  of  that  of  Pj2™' 
motion  considered  in  other  studies  which  investigated  die 
same  problem  ([27]).  With  this  fixed  axis  assumpnon,  die 
motion  of  any  point  on  a  rigid  object  relative  to  any  otter 
point  on  the  same  object  is  a  circle  ;n  a  plane  normal  to  .he 
fsed  rotational  axis.  This  circle  projects  as  an  ellipse  an  die 
image  plane  The  algorithm  proposed  in  U6j  exploited  this 
constraint  to  interpret  motion. 

In  the  following  sections,  we  will  describe  in  greater 
detail  the  methods  proposed  in  [24]  and  [25J. 


points  and  tte  parameters  of  motion.  Then  a  typical  count¬ 
ing  argument  dictates  the  number  of  points  that  must  be 
observed  in  order  to  solve  these  relations.  This  usually  has 
led  to  a  large  set  of  comp’icated  non-linear  transcendental 
equations.  Linear  methods  were  also  considered  when  a 
larger  number  of  points  could  be  observed  in  the  images. 

The  new  codon  in  the  method  presented  in  the  follow¬ 
ing  is  to  use  ite  principle  of  conservation  of  distances  in 
rigid  objects.  This  principle,  which  is  the  subject  of  a 
theorem  in  kinematics  of  solids,  simply  states  an  obvious 
fact:  distances  in  a  rigid  configuration  of  points  do  not 
change  during  motion.  It  was  shown  in  Minche  [15,i6]  that 
this  characterization  of  rigid  motion  can  lead  to  powerful 
formulations  of  various  structure  and  motion  problems. 

Consider  the  viewing  system  model  shown  in  Figure  I. 
Si  and  Sj  represent  the  camera  coordinate  systems  at  two  dis¬ 
tinct  viewpoints  (obtained,  say  by  a  moving  camera  at  two 
distinct  instants  of  time  or  by  two  cameras  from  two 
different  viewpoints).  The  approach  is  to  write  that  dis¬ 
tances  between  points  of  the  rigid  environment  are  fhe  same 
whether  expressed  in  S,  cr  in  S,.  There  is  no  menuon,  at 
thif  stage,  of  the  transformation  tlvu  takes  Si  into  Sj  or  vice 
versa.  Also,  we  use  the  scalar  notation  for  projective  rela¬ 
tions.  With  this  notation,  each  observed  point  contributes  2 
variables  (one  for  each  coordinate  system)  and  each  pau  of 
points  gives  oje  equation.  If  we  take  *  points  and  set  one 
variable  arbitrarily  to  fix  the  global  scale  of  the  object,  then 
we  end  up  with  10  equations  in  9  unknowns.  Although  this 
method  does  not  provide  an  economy  of  poiev  compared  to 
some  of  tiie  other  methods  ([23]  for  instance),  we.  neverthe¬ 
less  arrive  at  a  compact  and  robust  formulate  of  the  rrob- 
lem;  each  of  the  second  order  equations  we  obtain  involves 
at  most  four  of  the  nine  position  variables  and  none  of  the 
motion  variables.  As  a  result,  the  numerical  solution  of  the 
sysum  ot  equations  is  better  behaved  than  it  would  other¬ 
wise  be.  Moreover,  multiple  solutions  can  only  differ  by  a 
singular  configuration  or  a  reflection  in  space,  once  we  fix 
the  global  scale  factor. 

After  we  determine  the  position  of  points,  solving  for 
tte  motion  matrix  is  a  simple  matter  of  solving  a  4  x  4 
linear  sy:*em  of  equations  using  4  of  the  points.  The  actual 
parameters  of  r-otion  can  then  be  recovered  analytically  from 
the  motion  matrix. 

3.1  Estimation  of  Object  Position 

Using  the  viewing  system  configuration  shown  in  Fig¬ 
ure  1,  a  point  ft  in  space  with  coordinates  (X„yt&)  in  St  and 
'U,y„W,)  in  5j  is  imaged  on  ft  on  I,  and  ?,  on  /2.  Because 
is  on  line  C,  ft,  there  exists  a  real  number  X*>1  such  that 

X,  =  U 


3.  STRUCTURE  AND  MOTION  FROM  POINT 
CORRESPONDENCES 

The  problem  of  estimating  the  position  and  motion  of 
an  object  in  space  from  the  observation  of  a  small  number  of 
points  in  two  distinct  images  of  the  ooject  is  considered  in 
this  section.  In  general,  the  solution  consists  of  developing 
equations  using  projective  relations  for  the  observed  points 
involving  both  the  thiee-dimensionai  coordinates  of  the 


l’i  =  Xo’,  (1) 

Z,  =  (1-X,)/ 

where  (*„>;)  are  the  coordinates  of  ft  in  the  /.-image 


90 


s 


coordinate  system  and  /  is  the  focal  length.  Similarly,  P,  is 
on  line  Ctf,  and  if  (u^v.)  are  the  coordiiiates  of  q,  in  the  /2- 
coominate  system  tiien  there  exist  y,  >  1  such  that 

<Ji  =  1* 


Vi  =  Y,v. 


»i  -  (1-YJ/ 


The  squared  distance  between  points  Pt  and  P,  expressed  in 
5,  is  therefore 

dJ(S|)  =  (*-*■/  +  arV,f  ■*  (A-^)2 

Of 

<rJ(Si)  =  (M-V/)2  ♦  +  (X,-*;)2/ 


Similarly,  the  squared  distance  between  P,  and  P,  expressed 
in  5j  is 

dylSj)  =  (T,«,-V<,)2  +  (Y.',.-r/v/)2  +  (Y-Y/lV* 


Now  the  principle  of  conservation  of  distance  allows  us  to 
write  (assuming,  of  course,  identical  units  of  measurement  in 
S,  and  Sj): 

<(Si) «  dJfSj) 

or 

=  (2) 


solved  quite  efficiently  using  existing  numerical  iterative 
algorithms. 

3.2  Determining  Motion 

When  the  position  of  the  points  has  been  computed, 
determining  the  relative  position  of  the  cameras  becomes  a 
simple  matter.  Indeed,  if  one  takes  4  noo-coplanar  points  (4 
of  the  5  observed  points  in  space  or  3  of  them  with  a  fourth 
one  generated  using  tie  product  of  vectors  defined  by  these 
3  points)  and  calling  4,  and  42  the  matrices  of  homogeneous 
coordinates  of  these  in  S,  and  Sj  respectively,  the  transforma¬ 
tion  (in  homogeneous  coordinate  form)  that  takes  5,  onto  Sj 
is  given  ss: 

A,  »  UA ,  (3) 


Since  the  4  points  are  not  coplanar  then  Equation  (3)  can  be 
solved  for  M. 

Now  if  we  decompose  the  motion  into  a  rotation 
through  angle  8  about  in  axis  through  the  origin  the  direc¬ 
tion  cosines  of  which  are  followed  by  a  translation 

('■•VO  and  if  it  is  written  at 


«i  «i  oj  0 
a»  oj  et  0 

»7  Of  dt  0 

bi  tij  1 


and  one  can  show  that 

<\  “  fill  fj  «  * j,  /.  «  *j; 


Coj8  »  ■"  — 


(Y,«,-Y/a/)2+(Y,v1-Y/v/)2+<y,-Y,)2/* 


Sine 


a*-<H 

Z"l 


It  may  be  seen  that  each  point  P,  contributes  two  unk¬ 
nowns,  X,  and  y„  and  each  pair  of  points  (P.J’,)  gives  one 
second  ord-r  equation  (Equation  (2)).  Therefore,  J  points 
yield  10  equations  and  10  unknowns.  The  correspondence 
between  points  of  the  sets  in  the  two  views  is,  of  course, 
assumed  known.  It  may  be  noted  that  Equation  (2)  may  be 
rewritten  equivalently  using  a  scale  factor,  a  consequence  of 
the  fact  that  the  scale  of  the  observed  structure  of  points  can¬ 
not  be  recovered.  We  can  fix  this  scale  by  fixing  the  dis¬ 
tance  of  a  point  from  one  of  the  cameras  which  amounts  to 
fixing  arbitrarily  one  of  the  variables.  Therefore,  we  end  up 
with  a  system  of  10  equations  in  9  unknowns.  It  may  be 
noted  that  each  equation  involves  only  4  of  the  unknowns 
and  that  the  formulation  so  far  does  not  involve  the  parame¬ 
ters  of  the  displacement  between  the  two  cameras.  Because 
these  parameters  do  not  appear  in  the  equations  and  also 
because  only  some  of  the  unknowns  of  position  appear  in 
each  of  them,  the  resulting  system  of  equations  may  be 


a, -tort 
i-cai«  ; 


*  21,(1-^018,' 1 
e*  -  l-«f  -«§ 


Details  of  this  derivation  are  found  in  (30J  and  (31). 


i 

» • 

L 

i 

t 


\ 


r 


r 


* 


91 


3.3  Experimental  Results 

The  ilgorithm  for  computing  the  position  of  pouia  has 
been  tested  on  synthetic  data  and  camera- acquired  pictures  of 
real  objects. 

We  used  the  FORTRAN  subroutine  LMDER  [32]  to 
solve  the  system  of  equations  (2).  This  subroutine  minim¬ 
izes  the  sum  of  squares  of  m  non- linear  functions  in  n  vari¬ 
ables  by  a  modification  of  the  Levenberg-Marquardt  algo¬ 
rithm.  LMDER  is  an  iterative  procedure  which  requires  ini¬ 
tial  guesses  to  the  sohition  of  die  system  of  equations.  Also, 
LMDER  requires  the  Jacobian  of  the  functions  involved  in 
this  system.  In  our  context  the  Jacobian  can  be  calculamd 
quite  easily. 

Two  views  of  l  jeep  are  shown  in  Figure  2a  and  2b. 
The  five  points  used  are  marked  on  the  first  view.  The  aver¬ 
age  error  on  the  position  of  points  for  this  typical  exam  pie  is 
6%  of  the  average  distance  between  the  poims.  For  more 
details  on  this  experiment,  refer  to  [24 J.  Also  refer  to  [24] 
for  otter  examples  on  camera-acquired  picones  and  for  sta¬ 
tistical  rwulu  on  synthetic  dam. 


4.  INTERPRETATION  OF  STRUCTURE  AND 
MOTION  FROM  LINE  CORRESPONDENCES 

In  this  section  we  discusi  a  solution  id  the  problem  of 
determining  structure  and  motion  from  lines  correspon¬ 
dences.  More  precisely,  the  problem  is  dim  of  recovering 
the  orientation  and  position  of  »  set  of  lines  in  space  from 
multiple  views  of  these  lines,  as  well  «•  the  relative  displace¬ 
ment  between  the  view*.  We  consider  undirected  fines  with 
Do  otter  cue  such  as  points.  Mot  cover,  the  lines  at 
assumed  to  be  in  general  position  in  space.  The  method  is 
based  on  tte  observation  of  four  lines  in  three  distinct  views. 
The  case  of  two  views  hat  been  shown  to  be  inherently 
ambiguous  [23].  Ore  method  here  exploits  tte  principle  of 
invariance  of  angular  configuration  with  reaped  to  rigid 
motion  in  addition  to  tte  usual  projective  constraints. 

A  viewing  system  is  again  represented  by  a  central  pro¬ 
jection  model.  Tte  geometric  configuration  with  such  a 
model  is  depicted  in  Figure  3.  The  projection  center  of  tte 
viewing  system  J  is  C  ind  tte  imige  plane  is  /.  The  coordi¬ 
nate  system  in  space  is  (OJC.W  where  (OJf.n  i»  the  image 
coordinate  system.  The  coordinates  of  C  in  S  we  (CW). 
The  nrincipal  axis  of  tte  camera  which  this  viewing  system 
models,  is  aligned  with  tte  Z-sxis.  A  point  P  in  rpace  pro¬ 
jects  on  point  p  in  tte  image  as  indicated  in  Figure  1  and 
line  A  would  correspondingly  project  on  line  /. 

When  we  use  »i  views  we  will  hive  m  such  viewing 
systems.  With  *i  views,  we  can  write  m-1  constraints  on  tte 
orientation  of  lines  for  each  possible  pairing  in  tte  let  of 
observed  lines.  Projective  constraints  are  such  that  each  line 
will  contribute  one  unknown  per  view.  With  three  views 
(tte  case  we  will  treat  in  detail)  for  instance,  four  line*  yield 
twelve  equations  in  twelve  unknowns  which  are  tolved  for 
tte  orientation  of  lines  in  space.  The  rotational  components 
of  motion  between  tte  viewing  system*  are  then  readily 
recovered  from  these  orientations.  Finally,  die  translation 
components  of  motion  (and  therefore  tte  position  jf  lines  in 
space)  may  be  recovered. 


4.1  Angular  Invariance 

The  principle  of  angular  invariance  states  that  angles 
between  lines  of  a  set  of  lines  in  space  do  not  change  as  a 
result  of  a  rigid  motion  of  this  act  of  Lines.  In  our  investiga¬ 
tion  we  consider  undirected  lines  on  which  f  tense  cannot  be 
fixed.  Therefore,  tte  cotutrai *  on  angular  invtnance  we 
exploit  is  one  that  preaerves  tte  quantity  cart  where  8  is 
any  of  tte  two  angles  8,  or  8,  between  two  observed  lines  in 

tpact- 

Let  A,  and  A,  be  2  fines  in  space  the  projection  of 
which  in  S  mt  respectively  /.  red  1,.  Then,  for  any  two 
non- zero  vectors  Pt  on  A,  and  -  2  on  A,  we  have: 

’  P,  P, 11 


cort  ■ 


|P,I  |Pj 


(4) 


K 


If  we  obaerve  tte  same  lines  A,  and  A.  is  inter  refer¬ 
ence  system  S',  then  one  cm  write  another  expression  for  8: 


cort  ■ 


|Pil  IPS 


where  P„  have  meanings  similar  to  Pi,  Pj. 

Then  the  principle  of  invariant*:  of  angulv  configuration  for 
At,  Aj  between  S  ad  S'  states  that 

fp.p.i1 

[iPiiiPjJ  *  litfi  iPS. 

Now  let  P|,  P]  be  tbn  end  poiio  of  Pp  P|  ad  Pj  here, 
in  this  order,  projections  p,,  p.  with  image  coordinates  (xioq) 
id  (x>xi).  Projective  coota  ants,  in  their  scalar  font,  allow 
us  to  write  tte  following  equations,  where  (I, J'ti,)  mi 
(Ti.l'iA)  nre  tte  coordinates  of  P|  id  Pt  respectively:  there 
exists  Xt  >  I  such  that 

JT,  •  Mi 


f 


T|  *  Mi 


(5) 


Z,  «(1-X,/ 

Similarly,  there  is  A*  >  !  such  that 
fi  “  Mi 


Pi  •  Ms 
Zi  -  u-Mf 

The  components  of  P,  sre  therefore: 


(6) 


92 


<r . 
£ 


i 


-  ^**2  “  ^1*1 


v\,  =  **>h  -  *-iTi 


(7) 


V'i.  -  di  ~  W 


A.  **i-A 

A'“  7 

.,  *'*-/, 
'4*“  7 


(12) 


k\ 

C\ 

L\ 

C\ 


If  wc  divide  each  expression  in  O  by  X,  and  let  X  =  t-  then 
the  following  unit  vector,  is  in  the  direction  of  Vx 


X*2-*i 

« 


*2 


a 


(8) 


4j 


rS 


where  a  -  n/(Xx,  -  x,)1  +  <*.y,  -  y,>  ♦  (1  -  i  ff1 

If  0,  nd  Oi  are  the  endpoints  of  with  image  coordi¬ 
nate*  (■,.»,)  and  («>*])  respectively.  then  it  i*  straightforward 
to  wnte  expressions  simiLre  to  ihott  in  (8)  for  fj: 

There  exists  y  *uch  that  the  vector  ( 8 ,,  8,)  i*  a  unit  vector 

in  the  direction  of  (7, 

„  l*i-  «i 

B  »- — z— 


7*1  -  *1 
b 


(9) 


®i  * 


c-rr 

b 


Where  b  -  v'lyu,  -  a,)3  ♦  lyv,  v,)3  ♦  (I  -  -py3. 


L‘.‘ 

E 


► 

► 


Now  we  can  write 

cot^-M,*,  ♦  A,*,  o  (10) 


If  we  have  another  view  of  A|  and  A,  where  the  view¬ 
ing  system  of  the  second  view  is  modelled  by  reference  sys¬ 
tem  5*  then  we  can  wnte  expression*  similar  to  those  in  (8), 
(9),  >-.A  (!0),  i.e.,  there  m  X'  and  y'  such  that 

COTS  »  (A'tBT,  +  5J ♦  A\lfj'  (II) 


where 


A\= 


liz^L 

d 


and 


r,m 


*i* 


T^-  *i 
b' 


03) 


C-TV 

b' 


where  (xVyD  *r*  the  image  coordinates  of  a  point  on  line  A,; 
<Xj.y9  are  the  image  coordinates  of  another  poire  on  A,; 
(k’i  .  vj  are  the  image  ooordiaace*  of  a  point  on  Aj;  (a are 
the  image  coordinates  of  another  point  on  A,.  All  these 
image  coordinates  are  measured  in  die  image  plane  of  S’. 

The  principle  of  invariance  of  angular  coucguratioo  for 
line  At  and  line  Aj  between  viewing  systems  S  and  S’  is  then 
written  as 

(A|8|  +  AjB]  +  Si8j)*  ■  (A|B|  +  A]Bj+  A\B (14) 


or  in  expanded  form 


KXx;-i|)o«i-.ii)  ♦  (^yi~.*' xtvi—V|)  ♦  (i-Xxi-ry1!1 

0 V 


(tS) 


((XVi-  djly'd,-  u’i  V  (X->4-  ytKyVi  -  vj  ♦  (I  -  XVI  -yVY 
a"3  A'1 


Image  coordinates  involved  in  (IS)  are,  of  course, 
obtained  from  any  two  points  on  the  images  of  line  A,  and 
ary  two  points  on  the  images  of  line  A,  in  the  im_ge  planes 
of  J  and  S'.  The  unknowns  are  then  X,  y,  X'.  y‘;  i.e.,  one 
unknown  is  involved  per  line  and  per  vew.  We  already 
have  mentioned  that  the  case  of  two  views  is  highly  ambigu¬ 
ous  (refer  to  |25)  for  s  proof) 

If  we  consider  four  lines  in  space  (therefore  6  possible 
different  pairs  of  lines)  and  three  views,  then  a  simple  count¬ 
ing  argument  determines  12  equations  in  12  unknowns  over 


I 


»-«-v 

v.-. 

V*. 

*•'.1 

‘-".1 


c  ' 


t. 


r 


7s 


y 


O 


93 


REFERENCES 


£ 

i 


a 


r 


Jr 


the  dree  view*.  We  obtain  i  iywem  of  equation*  with  the 
following  properties: 

(a)  The  equations  are  of  second  order  in  each  of  the  vari¬ 
ables 

(b)  Only  4  of  the  12  unknown*  appete  in  each  equation 

(c)  The  motion  parameter*  ate  not  involved  i»  the  equa¬ 
tions. 

This  system  is  solved  for  the  orientation  of  line*  in 
space.  Becaus.  of  ore  characteristic*  above,  the  system  is 
better  behaved  numerically  than  it  would  otherwise  be. 

4.2  Determining  Motion 

In  [231,  it  is  shown  that,  once  the  orientations  ate 
known,  the  rac-ion  parameters  as  well  as  the  exact  position 
of  the  lines  ui  space  can  be  recovered  by  solving  a  linear 
system  of  equations.  The  derivators  which  ate  rather 
lengthy,  are  not  reproduced  in  this  paper.  Full  detail*  are 
found  in  [23]. 

43  Experimental  Remits 

We  have  generated  randomly  a  large  number  of  tea  of 
4  lines  in  space.  We  also  added  noise  so  the  projection  of 
these  lines  in  the  image  plane.  Two  points  on  the  image  line 
are  moved  randomly  in  a  3x3  pixel  area,  creating  a  noisy 
image  line.  Results  of  the  experiment*  with  LMDER  are 
shown  graphically  in  Figure  4  which  represent  the  average 
difference  between  compelled  and  actual  angles  between  lines 
in  space  versus  precision  of  initial  approximation.  The 
observation  is  (hat  LMDER  performs  reasonably  well  with 
fair  initial  approximation  Nots  that  noise  in  die  image  is, 
by  itself,  responsible  for  approximately  2%  error. 

The  performance  of  the  method  on  real  cases  of 
'-amera-acquired  piennv*  is  currently  under  mvestigatior 


5.  SUMMARY 

The  detection  and  measurement  of  motion  from  images 
is  of  fundamental  importance  in  a  number  cf  applications. 
Various  approaches  have  been  proposed  and  each  of  them, 
of  coune,  hat  its  advantages  and  limitations  [33J.  This 
paper  has  presented  the  ongoing  work  on  the  subject  at  the 
Computer  and  Vision  Research  Censer  at  The  University  of 
Texas  at  Austin  with  emphasis  put  on  the  more  recent 
results.  These  results  exploit  properties  of  rigid  mooon 
explicitly.  Two  such  methods  have  been  described.  One  of 
these  methods  is  based  on  the  use  of  points  and  exploits  the 
fact  that  rigid  motion  does  not  change  distances  between 
points.  The  other  method  uses  lines  and  the  property  that 
rigid  morion  doe*  not  alter  angular  configuration.  Additional 
ongoing  research  on  the  recovery  of  structure  and  motion  is 
based  upon  the  obsrrvaooo  of  intensity  and  range  images, 
presented  in  [34], 


[1]  J.A.  Leese,  C.S.  Novak,  and  V.R.  Taylor,  "the  Deter¬ 
mination  of  Cloud  Pattern  Mooon  from  Geosynchro¬ 
nous  satellite  Image  Data,'  Pattern  Recognition.  22, 
pp.  279-292,  1970. 

[2]  R.M.  Endlich,  DE.  Wolf,  DJ.  Hill,  and  A.E.  Drain, 
'Use  of  a  Pattern  Recognition  Technique  of  Satellite 
Photographs.'  J.  Appi.  Met..  10,  pp.  103-1 17,  1971. 

[3]  J.K.  Aggawal  and  N.L  Badler  (Eds.),  Abstracts  «  ujs 
Workshop  on  Computer  Analysis  of  Tune- '  ‘trying 
Imagery,  Uteveriity  of  Pennsylvaua,  Moore  School  of 
Electrical  Engineering.  Philadelphia.  PA.  April  1979. 

[4]  J.K.  Aggarwal  and  N.L  Badler  (Goeet  Eds.),  Special 
Issue  on  Moriosi  ted  Time-Varying  Imagery,  IEEE 
Tram,  on  PAMI.  Vo L  PAM1-2,  No.  6,  November 
1980. 

[5]  W.E.  Snyder  (Guest  Ed.),  Computer  Analysis  of  Tune- 
Varying  Images,  IEEE  Computer,  Vol.  14,  No.  8, 
August  1981. 

[6]  J.K.  Aggarwal  (Guest  Ed),  Morion  and  Time  Varying 
Imagery  Computer  Vision,  Graphics  and  Image  Pro¬ 
cessing.  Vol.  21,  No*.  1  and  2,  January,  February 
1983. 

[7]  T.S  Huang,  Image  Sequence  Analysis.  Springer- Verlag, 
New  York,  1981. 

1 8]  NATO  Advanced  Study  latrines  cm  Image  Sequence 
Processing  and  Dynamic  Scene  Analysis,  Advance 
Abstract*  of  Invited  and  Contributory  Pipers,  June  21- 
July2,  1982,  BrauaUge,  West  Germany. 

[9]  SiggaraphCiggart  Interdisciplinary  Workshop  on 
Morion:  Repreaeatarioo  and  Perception.  Toronto, 
Canada,  April  4-6  1983. 

[10]  lnternteianal  Workshop  on  Time-Vsrying  Image  Pro¬ 
cessing  and  Moving  Object  Recognition,  Florence, 
Idly,  Mi)  1982 

[11]  WJ9.  Martin  and  J.K.  Aggarwal,  'Dynamic  Scene 
Analysis:  a  Survey  *  Composer  Graphics  and  Image 
Practising  7  pp.  336-374,  1978. 

[12]  H.-H.  Nagel,  'Analysis  Techniques  for  Image 
Sequences,’  in  Proc.  UCPR-78.  Kyoto,  Japan, 
November  1978,  pp  186-211. 

(!3j  J.K.  AggarwsJ  and  W.N.  Martin,  ' Dynamic  Sme 
Analysis,'  in  roe  book  Image  Sequence  Processing  and 
Dynamic  Scene  Analysis,  edited  by  T.S.  Huang, 
Spnnger-Veriag.  1983,  pp.  40-74. 

[H)  J.K.  Aggarwal,  Ttsee-Dimentioaal  Description  of 
Objects  and  j./Ormic  Scene  Analysis,*  Digital  Image 
Analysis,  edited  by  S.  Levialdi.  Pitman,  1984,  pp.  29- 
46. 

[13]  H.-H.  Nagel,  "What  Can  We  learn  from  Applica¬ 
tions?"  in  the  book  Image  Sequence  Analysis.  Edited 
by  T.S  Huang,  Springer-Verisg,  1981,  pp.  19-228. 

[16]  T.S.  Huang  ^Editor),  Image  Sequence  Processing  and 
Dynamic  Sc*. w  Analysis.  Proceedings  of  NATO 
Advanced  Study  Institute  at  Braunlage,  West  Germany, 


.  i 
.  > 


*  — 


_N  , 

ir1 


s 


ft 

L 


94 


Springer-Verlag,  1983. 

117)  J.K.  Aggarwal  and  R.O.  Doda.  "Computer  Analysis  of 
Moving  Polygonal  Images,"  /£££  Trans,  on  Comput¬ 
ers.  24,  No.  10,  pp.  966-976,  1975. 

[1S|  W  K.  Chow  and  J.K.  Aggarwal,  ‘Computer  Analysis  of 
P!ana>  Curvilinear  Moving  Imrges,"  IEEE  Trans  on 
Computers.  26.  No.  2.  pp.  179-185,  1977. 

(19)  W.N.  Martin  and  J.K.  Aggarwal,  "Computer  Analysis 
of  Dynamic  Scenes  Containing  Curvilinear  Figures," 
Pattern  Recognition.  11,  pp.  169-178,  1979. 

[201  R.  Jain,  W.N.  Mstin,  and  J.K.  Aggarwal,  "Segmenta¬ 
tion  Through  the  Detect! on  of  Changes  Due  to 
Motion,"  Computer  Graphics  and  Image  Processing, 
11.  pp.  13-34.  1979. 

(21)  S.  Yalamanchili,  W.N.  Martin,  and  J.K.  Aggarwal. 
"Extraction  of  Moving  Object  Descriptions  via 
Differencing,"  Computer  Craohics  and  Image  Process- 
irg,  18,  pp.  188-201,  1982. 

[221  J.W.  Roach  anu  J.K.  Aggarwal,  "Computer  Tracking  of 
Objects  Moving  in  Space,"  IEEE  Trans  <.t  Pattern 
Analysis  an,'  fl.nchtne  Intelligence.  1.  No.  2,  pp.  127- 
135,  197". 

1 23)  J.W.  Roach  at*]  '.K.  Aggarwal,  "Determining  the 
Movement  of  Objects  from  a  Sequence  i  f  Images," 
IEEE  Tr.wt  on  Pattern  Analysts  and  Yacht ne  Intelli¬ 
gence.  2,  No  6,  pp.  554-562,  1980. 

(24)  A.  Mitiche.  S.  Seida.  and  J.K.  Aggvwai,  ‘Determining 
Position  and  Displacement  in  Space  from  Images," 
Proc  IEEE  Cortf  on  Computer  Vision  and  Pattern 
Recognition.  Sen  Francisco.  CA.  pp.  504-509,  June 
1985. 

(25)  A.  Mitiche.  S.  Seida,  and  J.K.  Aggarwal,  ‘Interpreta¬ 
tion  of  Structure  and  Mcaoe  from  Line  Correspon¬ 
dences,"  submitted  to  Pattern  Analysis  and  Machine 
Intelligence. 

(26)  J  A.  Webb  and  J.K.  Aggarwal.  "Structure  from  Motion 
of  Rigid  and  Jointed  Obiects."  Artificial  Intelligence. 
19.  pp  107-130.  1982. 

(27)  D.D.  Hoffman  and  B.  Flinchbaugh,  The  Interpretation 
of  Biological  Motion,  MIT  Al  Memo  608,  Mas- 
sachuseru  Institute  of  Technology,  Cambridge,  MA 
1980. 

1 28 1  A.  Miuche,  "Computation  of  Optical  Plow  end  Rigid 
Motion,"  Proc  Workshop  on  Computer  Vision: 
Representation  and  Control,  pp.  63-71,  Annapolis, 
MD.  1984. 

[29|  A  Miuche,  "On  Combining  Stercopsis  and  Kineopsis 
for  Space  Perception,"  Proc  First  Conference  on 
Artificial  Intelligence  Applications.  Denver,  CO,  pp. 
156-160.  1984. 

[30|  P.Y.  Tsai  and  TS.  Huang,  ‘Uniqueness  and  Estimation 
of  Three-Dimensional  Motion  Parameters  of  Rigid 
Objects  With  Curved  Surfaces,"  IEEE  Trans  on  Pas¬ 
tern  Analysis  and  Machine  Intelligence.  6.  No.  !,  pp. 
13-26.  1984 


(311  A.  Mitiche  and  P.  Bouthemy,  Tracking  Modelled 
Strucm.xi  Using  Binocular  Images,"  Computer  Vision. 
Graphics,  and  Image  Processing,  to  appear. 

1321  LMDER,  Minpack  Subroutine,  Argonne  National 
Laboratories.  March  1980 

133)  J.K.  Aggarwal  and  A.  Mitiche.  "Structure  and  Motion 
from  Images:  Fact  and  Fiction,'  Tlurd  Workshop  on 
Computer  Vision:  Representation  and  Control,  Bellaire, 
ML  pp-  127-128,  1985. 

134]  J.K.  Aggarwal  rod  M.  Magee,  "Determining  Motion 
Parameters  Using  Intensity  Guided  Range  Sensing,"  to 
appe.*  in  Pattern  Recognition 


Average 

Angle 

Error 

On  degrees) 


Figure  3.  Projective  configuration  for  a  line  in  soace 
with  respect  to  viewing  model  S. 


20 
18 
16 
14 
12 
10 
8 
6 
4 
2 
0 

30  25  20  15  10  5  0 

Precision  of  Initial  Guesses  (X) 


Figure  4.  Average  error  in  computed  angles  versus 
precision  of  initial  guesses. 


97 


wt '  .■  r- v  mi  i ijw 


Jl .  inRKSMWWlWBSn 


ajwifur^JuiiuHiwui  ry 


9 


♦ 


t 


SURFACES  FROM  STEREO 


William  Hoff 
N'arcndra  Ahu)a 


University  of  Illinois 
Coordinated  Science  l^txiratory 
1 101  W.  Springfieid  Ave. 
Urbana.  1L  61  SOI 


ABSTRACT 

This'  paper  describes  an  algorithm  and  iu  implementation 
to  compute  the  surface  Map  o.  a  scene  from  a  stereo  pair  of 
imaves.  The  algorithm  detects  and  matches  zero  crossings  result- 
in,.  alter  applying  the  Marr-H.ldrelh  operator  .o  the  .mages. 
Ambiguities  are  resolved  by  choosing  the  matches  which  a  low  a 
smooth  surface  to  b'  interpolated  through  depth  points.  1 
differs  from  other  stereo  algorithms  in  that  it  u^s  smoothness  ol 
the  resulting  algorithms  as  a  criterion  for  matching.  An  inter 
leaved^ sequence  of  match, ng-for-smoolhn«ss  and  surface  ,n«r- 
polation  operations  generates  a  mull, resolution  tier.rchy  of  sur¬ 
face  maps  starting  from  the  coarse  and  progressing  towards  the 
fine.  As  another  important  feature  of  the  algorithm  at . 
the  surface  mierpolatioi,  process  take',  into  account  the 
occluding  and  ridge  contours  in  the  s.en*.  which  ™  wbrn 

depth  and  orientation  change  abruptly.  Interpolation  « J*r 
formed  within  regions  enclosed  by  these  contours.  The  algorithm 
lakes  a  fairly  u.^estricuve  view  ,f  real  world  objects  captured 
m  the  following  smoothness  constraint:  surfaces  in  the  ret 
world  are  smooth  and  continuous  eacepl  across  relatively  rare 
occludir„  -  nd  ridge  contours. 


1.  INTRODUCTION 

This  paper  describes  a  stereo  vision  algorithm  and  its 
implementation  The  purpose  of  stereo  algorithms  is  to  lake  two 
images  of  a  scene,  from  slightly  different  viewports,  and  pro¬ 
duce  a  complete  depth  map  of  the  visible  surfaces.  I  he  usual 
paradigm  of  these  algorithms  is:  (1)  defect  sui.able  features  in 
rach  image.  (2)  match  corresponding  features  to  determine  their 
depths,  and  C.)  interf-Jaie  to  obtain  a  complete  depth  map 
lBarn»2],  I  or  the  nv.c  part,  the  features  that  have  been  used  in 
stereo  have  been  low  level  -  that  is.  they  are  only  a  I  unction  of 
The  local  inftnsit  es  and  are  not  semantic  in  nature.  However, 
due  to  their  simplicity,  low  level  features  can  have  many  ambi¬ 
guous  matches,  and  this  makes  the  matching  step  difficult  Also, 
occluding  and  ridge  contours  in  the  scene  (where  the  depth  and 
orientation,  respectively,  change  abruptly)  create  diTicullies  for 
matching  and  interpolation 

The  algorithm  described  addresses  these  d.fficullies  by 
using  a  smoothness  constraint  for  natching  that  explicitly  incor- 
porafs  the  existence  of  dep'h  and  orientation  contours  in  the 
compilation,  and  enforces  surface  smoothness  everywhere 
evep.  across  such  contours.  It  integra'es  mulching,  contour 
c  .-tection  and  surface  inte-polalion  into  a  single  process  for 
extracting  surfaces  from  stereo  images.  The  algorithm  is  fairly 
domain-independent  since  it  uses  no  constraint  other  than  the 
assumption  that  objects  have  smooth  surfaces,  in.  the  depth  and 
surface  normal  vary  gradually  except  across  relatively  rare 
occluding  and  ria,,e  contours _ 

The  support  ol  the  National  Science  foundation  under  grant  I'OS 
h3.'24(>h  is  gratelully  acknc-  vledged. 


2.  BACKGROUND  AlfD  MOTIVATION 

Existing  algorithms  may  be  classified  according  to  the  type 
cf  features  they  defect.  Area-based  algorithms  attempt  to  match 
smi’l  windows  in  each  imge  by  correlating  their  intensities 
[Gg.tn77.  HannSO.  Mora81.  Panl/8].  Edge-based  algoiithms 
detect  edge  or  point-like  features  and  attempt  to  match  those 
[Arno7S.  Bam80.  Grim81.  Hend7b].  Baker  uses  both  areas  and 
edges  [Bake8l],  Area-based  algorithms  have  been  applied  suc¬ 
cessfully  to  the  analysis  of  aerial  images,  where  the  terrain  is 
smoothly  varying  and  continuous.  However,  they  have 
difficulty  in  dealing  with  scenes  that  contain  depth  and  orienta¬ 
tion  discontinuities,  because  th;  matching  windows  may  cross 
surface  discontinuities.  Edges  ire  intrinsically  more  localizable 
and  thus  are  more  suitable  in  these  cases.  Also,  they  deal  with 
intensity  changes  rather  thar.  raw  intensities  and  thus  are  a 
better  characteristic  of  physic.il  changes  ir  Ihe  scene. 

Ambiguities  may  arise  when  matching  edges  because  there 
is  little  to  characterize  an  edge  besides  its  location  and  orienta¬ 
tion.  and  an  edge  in  one  image  may  match  sevtral  in  the  other 
image.  To  select  the  correct  match.  Barnard  and  Thompson  use 
the  constraint  that  uearby  image  points  should  have  nearly  the 
same  disparity  [BarnSO],  Faker  selects  edge  matches  so  that  the 
left-right  ordering  of  edge  points  within  a  row  in  one  image  is 
the  same  as  the  lefl-righ.  ordering  in  the  corresponding  row  in 
the  other  image  [BakeSl’.  In  the  algorithm  of  Marr-Poggio  and 
Crimson  [Grimdl],  ambiguities  are  resolved  by  choosing  the 
match  with  a  disparity  that  has  iu  sign  (convergent,  divergent, 
or  zero)  the  same  as  the  majority  of  the  non-ambiguous  points  in 
the  neighborhood.  Ma;  hew  and  Erisby  note  that  edges  which  lie 
on  a  single  continuous  contour  in  one  image  should  also  have  this 
continuity  in  the  olhe-  image,  and  ur?  this  principle  to  eliminate 
ambiguities  [Mayh8l! 

All  previous  algorithms  complete  the  matching  process 
fce/ore  interpolating  .0  obtain  a  der.se  depth  map.  Uniqueness  of 
matching  is  only  enforced  by  conditions  that  involve  simple 
locai  relationships  i  mong  disparity  values  as  mentioned  above 
and  tv*  the  propert-es  of  the  resulting  surface.  However,  since  a 
given  disparity  valje  implies  a  depth  value,  a  stereo  pair  wu-i 
ma'ching  ambiguities  implies  multiple  surfaces,  having  different 
smoothness  proper. tes.  The  relative  accepubilily  of  these  sur¬ 
faces  should  be  determined  by  the  nature  of  the  real  world 
objects,  namely,  that  their  surfaces  are  smooth  in  the  sense  that 
I  he  normal  direct!  in  varies  slowly,  except  across  relatively  rare 
creases  and  ridges  Thus,  the  expecutir.  .  about  surface  charac¬ 
teristics  have  implication';  about  resolution  of  matching  ambigu.- 
lies.  This  suggests  that  matching  decisions  should  lake  into 
account  the  properties  of  the  the  result. ng  surfaces,  the  locations 
of  depth  and  orientation  contours  as  well  as  smoothness  of  the 
surface  parts  en<  losed  by  these  contours.  This  is  in  contras',  lo 
the  traditional  jtrst-linish-malchinn-llten-irderpialt  approach 
used  by  all  existing  stereo  algorithms. 


98 


o 


'Am  a  vv 


#  ■  p  ^  ^  JJFJ  li" JPi  A  J. 


We  conducted  an  experiment  in  human  stereo  vision  to  lest 
the  merit  of  our  srrun nhnes\-of -disparity  constraint,  against  the 
constant -local  dis parit y  constraint  used  in  the  past  [HoffK5].  We 
generated  a  random  dot  stereogram  such  that  the  use  of  the  two 
different  matching  constraints  would  yield  the  perception  of  twr 
different  surfaces  The  random  dot  Mereogram  pcrtravs  a  sur¬ 
face  wiiose  height  is  that  of  a  cosine  wave,  and  which  is  unambi¬ 
guous  everywhere  except  lor  a  small  region  centered  on  the  peak.. 
This  ambiguous  region  can  be  perceived  as  a  smooth  continuation 
of  the  cosine  wave,  or  as  a  surface  which  is  locally  rough  but  has 
approximately  constant  height.  The  observer  fixates  at  the  depth 
midway  between  the  two  surfaces.  Figure  1  shows  the  experi¬ 
mental  setup.  In  our  experiment,  most  observers  saw  the  smooth 
peak  predicted  by  our  smoothness  constraint,  instead  of  the 
rough  peak  predicted  by  constant-local-disparity  constraint. 

Thus,  to  enforce  the  surface  smoothness  constraint  it  ,s  not 
sufficient  to  use  the  local  disparity  histogram  as  has  been  clone  in 
the  past.  Rather,  both  the  values  of  disparities  as  well  as  the 
locations  of  features  giving  rise  to  these  values  should  i*  taken 
into  account.  The  enforcement  of  local  constancy  o*  disparity 
biases  the  resulting  surface  towards  a  frontal  orientation. 

We  have  incorporated  the  more  general,  piecewise  smooth¬ 
ness  constraint  into  our  algorithm  to  obtain  sur  faces  from  stereo. 
Note  that  as  a  consequence  of  the  integration  process,  our 
algorithm  more  homogeneously  carries  out  the  stereo-to-surface 
transformation  than  the  existing  algorithms,  w  here  most  of  the 
computation  is  devoted  to  feature  matching  and  thus  a  stereo 
lo-depth-points  transformation.  The  surfaces  in  these  algorithms 
only  result  from  the  final,  independent  step  of  interpolation 
through  the  depth  points. 

3.  OVERVIEW  OF  THE  ALGORITHM 

An  outline  of  the  algorithm  is  shown  in  I  igure  2.  The  rro 
cessing  is  done  in  a  coarse- to- fine- resolution  mode.  The  algorithm 
starts  witii  an  initially  specified,  arbitrary  estimate  of  the  sur¬ 
face  map.  eg  .  a  flat  frontal  surface  at  some  depth.  At  each  reso¬ 
lution  level,  the  following  str*>.  are  performed.  First,  edges  *~e 
detected  in  ea.h  of  the  two  sT*reo  images.  Matches  are  sought 
lor  the  edges  in  locations  predicted  bv  the  surface  depth  at  the 
previous,  coarser  resolution  level.  Fach  possible  match  obtained 
corresponds  to  a  point  whose  position  in  the  array  as  well  as 
height  are  known.  The  match  is  recorded  in  a  (x.y./)  array  by 
locating  points  with  appropriate  height  /  lor  each  edge  point 
(x.y).  This  results  in  a  sparse  set  of  spikes  with  lips  that  nust 
lie  on  the  surface.  Second,  largest  possible  smooth  patches  ire  :it 
centered  at  each  (x.y)  posmun  :r.  an  image.  Third,  a  comp,  nsor 
of  adjacent  patches  identifies  those  pairs  that  differ  in  depth  or 
orientation.  Such  pairs  of  patches  yield  estimates  of  depth  and 
orientation  contours  in  an  image,  f  inally,  a  smooth  surface  is 
interp'laled  through  depth  points  in  each  region  surrounded  by 
occluding  or  ridge  contours.  Th:s  gives  a  piecewise  smooth  sur¬ 
face  map  at  the  given  resolution  The  process  is  then  repeated  at 
finer  rt^olution  using  the  current  surface  to  f  edict  matching 
locators  of  edges  at  the  finer  resolution.  Processing  at  succes¬ 
sively  liner  resolutions  yields  surfaces  at  increasingly  fine  reso¬ 
lution. 

The  algorithm  matches  individual  points  in  the  left  image 
w-ith  the  corresponding  punts  in  the  right  image.  Currently, 
these  punts  are  :he  zero  crossings  of  the  Marr- Hildreth  operator. 

I  or  each  pair  of  corresponding  punts,  the  depth  mav  be  calcu¬ 
lated  1  rsm  the  disparity  in  the  positions  of  the  two  points.  I  he 
matching  is  driven  from  left  to  right,  so  that  the  result  is  a  set  of 
punts  in  the  left  image,  each  lal>eled  with  one  or  more  depth 
value  \ 

The  depth  punts  have  the  following  character  is  lies:  First, 
they  mav  have  amb.guous  denth  values.  This  is  caused  by  the 


fact  that  in  some  cases,  a  point  in  the  left  image  can  match  more 
than  one  point  in  the  right  image,  implying  more  than  one  possi¬ 
ble  depth  for  that  point  Second,  some  of  the  points  may  have 
re  correct  depth  value,  due  to  noise  or  occlusion.  There  may 
also  xe  it. correct  guidance  for  the  matching  from  the  coarser  lev¬ 
els.  Third,  the  points  are  sparse,  which  is  characteristic  of  zero 
crossings  Fourth,  the  depth  values  are  noisy,  which  can  be 
caused  by  image  noise  It  can  also  be  caused  by  the  blurring 
effect  of  the  Marr  Hildreth  operator.  In  general,  the  uncertainty 
in  the  position  of  the  zero  crossing  is  proportional  to  the  size  of 
the  operator. 

Thus,  to  solve  the  mulching  problem  and  interpolate  a  sur¬ 
face.  we  invoke  the  smoothness  constraint  ~  that  objects  in  the 
real  world  tend  to  have  piecewise  smooth  surfaces.  Thus,  the 
depth  values  are  assumed  to  be  noisy  samples  of  a  surface  which 
is  smooth  in  the  sense  that  the  depth  a.td  surface  normal  vary 
slowly,  but  which  may  contain  depth  and  orientation  discon¬ 
tinuities  at  the  relatively  rare  occluding  boundaries  and  creases. 
We  start  with  reconstructing  a  polyhedral  approximation  of  the 
original  surface  by  fitting  planar  patches  to  the  depth  points. 
From  these,  we  obtain  a  piecewise  smooth  approximation  to  the 
original  surface,  with  depth  and  discontinuities  located. 


4.  DETAILED  DESCRIPTION 

The  following  is  a  detailed  description  of  the  algorithm.  It 
was  implemented  in  "C"  on  a  VAX  11/7X0.  Some  of  the  runs 
were  done  on  a  Gould  9050  superminicomputer. 


4.1.  DETECTION  OF  FEATURES 

*  The  first  phase  of  the  algorithm  is  similar  to  the  Marr  and 
Gnmson  method  of  detecting  features.  The  left  and  right  images 
are  each  convolved  with  the  Mart  Hildreth  operator  (Laplaciai 
of  a  Gaussian)  of  tinier  ent  sizes  (d.fferent  widths  of  the  Gaus¬ 
sian).  Zero  crossings  ire  then  detected.  These  correspond  to 
significant  intensity  changes  in  the  image.  The  result  is  a  set  of 
left/righl  pairs  ol  images  showing  zero  crossing  locations.  Each 
pair  is  at  a  different  scile  of  resolution.  In  the  implementation 
the  resolution  was  reduced  by  a  factor  of  two  at  each  level. 
yieU’  ng  image  sizes  of  256x2^6,  12Xxl2X,  and  64x64.  The 
effective  width  of  the  Marr-lhldreth  operator  (the  diameter  of 
the  central  negative  region)  was  the  same  for  each  level.  i.e..  6. 

4.2.  MATCHING  ZERO  CROSSINGS 

Matching  of  zero  crossings  is  done  in  a  coarse  to  fine  pro¬ 
cess.  At  the  coarsest  level  the  algorithm  must  be  supplied  with 
an  initial  estimate  of  the  depth  map.  i  c..  a  constant.  To  match  a 
zero  crossing  in  the  left  image,  the  algorithm  attempts  to  find  one 
or  more  similar  zero  crossings  in  the  right  image.  It  searches  for 
candida'e  zero  crossings  in  a  small  horizontal  interval  (window) 
centered  at  the  location  predicted  by  the  depth  estimate.  The 
epipolar  lines  are  assumed  to  be  horizontal  so  that  searching  is 
restricted  to  one  dimension  T ht  window  width  is  equal  to  twice 
the  width  of  the  Gaussian  filter  used  to  create  the  zero  crossings 
at  this  level,  as  in  the  Marr -Gnmson  algorithm.  This  has  the 
consequence  that  there  is  a  509*  ^hanre  that  there  will  be  only 
one  zero  crossing  of  the  correct  sign  in  the  window,  otherwise 
there  will  usually  only  lie  two.  A  zero  crossing  that  is  found  in 
tne  window  is  a  match  if  it  has  a.  orientation  similar  to  th' 
orientation  of  the  zero  crossing  in  the  left  image.  In  the  imple¬ 
mentation  nn*  mat  ions  ere  quantized  into  32  quantizations. 
..orrespond.ng  to  0-360  degrees,  and  a  difference  of  up  to  2 
quantizations  is  allowed  for  a  match  If  there  is  one  match,  the 
point  is  unambiguous  a, id  the  depth  (or  disparity)  is  known.  If 
there  is  no  match,  nol.un;  further  can  be  done  with  this  point 
tnd  so  *t  is  ignored.  If  thtre  are  two  or  more  matches,  the  point 


99 


^  «* 


v  -WW*  -  \  1 1  M  'L*1  l-'.L"f 


l» i tii^wwwi i»f*w  m *[■: 1 1 ','■' 


is  ambiguous  and  the  algorithm  tries  to  resolve  th^s  ambiguity  in 
later  processing. 

The  program  attempts  to  match  only  non-horizontal  zero 
crossings,  since  th  -parity  of  horizonul  zero  crossings  is  sub¬ 
ject  to  large  err  The  percentage  of  ambiguous  point?  u 
smaller  than  50%.  due  to  the  added  constraint  that  the  orienta¬ 
tions  must  be  similar.  Also,  for  computational  reasons,  the  max¬ 
imum  number  of  matches  that  are  allowed  is  two.  If  a  point  has 
more  than  two  matches,  it  is  treated  as  unmalchable. 

A  surface  is  now  interpolated  through  the  depth  values  by 
the  interpolation  process  described  below.  To  match  the  neat 
finer  level  of  zero  crossings,  the  depth  estimate  is  given  by  the 
depth1  of  the  surface  at  the  same  point  in  the  current  level. 
Thus,  the  interpolated  surface  at  a  given  level  guides  the  match¬ 
ing  at  the  neat  level. 


43.  FITTING  PLANES 

The  algorithm  fits  planes  to  the  depth  values  in  circular 
regions  centered  at  every  4th  grid  point  in  the  image.  The  largest 
possible  disc  is  identified  at  each  point  under  the  constraint  that 
the  depth  points  in  the  disc  are  a  good  fit  to  a  plane.  We  used  a 
maaimum  radius  of  20  to  limit  the  computation.  For  each 
reg-jn.  the  two  pi.'es  having  the  smallest  squared  error  of  fit  to 
'.ne  depth  values  in  ti..i  region  are  found.  There  may  oe  more 
Ilian  one  plane  because  some  of  the  depth  values  may  be 
suspected  to  be  mismatches  and  may  be  ignored  in  fitting  a  patch, 
while  other  depth  points  may  be  ambiguous,  with  multiple 
values. 

To  determine  whether  or  not  the  points  are  a  good  fit  to  a 
plane  requires  an  estimate  of  the  noise  in  the  depth  values.  We 
assume  that  the  major  component  of  this  not*  is  due  to  the 
fluctuation  of  the  zero  crossings  about  the  true  edge  position. 
Berzins  [Berz84]  did  some  analysis  on  the  displacement  error  for 
specific  image  situations,  and  found  that  the  error  was  usually 
much  less  than  <rc  .  where  <r0  is  the  standard  deviation  of  the 
Gaussian  in  the  Marr-Mildreth  operator: 

V2G  (r  )=-_L.|l--l-r]ejp(-»  2/2<rj) 
n<T*  2cr,; 

liven  in  unusually  bad  cases,  the  error  was  comparable  to  <rG  .  If 
the  displacement  error  is  normally  distributed,  then  95%  of  the 
time  the  error  will  be  less  than  2<r.v  .  where  <rv  is  the  standard 
deviation  of  the  noise.  Therefore,  we  assume  2<rv  -<z0  ■  which  IS 
consistent  with  the  displacement  of  zero  crossings  observed  in 
our  experiments  (see  Section  5).  I  or  a  given  number  of  points  in 
a  region,  within  the  distance  2<r.v  of  the  plane,  the  probability  of 
the  sum  of  squared  errors  being  less  than  or  equal  to  e  may  be 
determined  from  the  chi-squared  distribution.  The  program 
determines  the  maximum  expected  squared  error  for  a  95% 
confidence  level.  If  the  squared  error  exceeds  this  value,  the 
plane  is  rejected.  As  a  further  check  on  the  validity  of  the 
planar  fit  far  the  region,  we  count  the  number  of  points  that  lie 
beyond  the  distance  2<rv  from  the  plane.  If  this  number  exceeds 
the  expeclid  5%  level,  the  plane  is  rejected. 

A  crucial  part  of  this  algorithm  is  the  use  of  the  llcugh 
transform  to  fit  planes.  This  is  important  because  the  ambiguous 
and  mismatched  points  would,  lead  to  combinatorial  explosion  if 
a  standard  least  squared  method  such  as  Gaussian  elimination 
was  used  to  fit  planes  to  each  possible  subset  of  depth  points  in  a 
region.  The  Hough  transform  is  used  to  calculate  the  least 
squared  fit  plane  by  the  following  method:  A  three-dimensional 
parameter  space  is  set  up.  with  each  dimension  corresponding  to 


'In  this  paper,  "depth"  and  "disparity"  are  uced  iRtcrchang'.bly  because  the 
depth  can  be  readily  calculated  [rum  the  disparity  and  Ihe  camera  model. 


a  parameter  i.i  the  equation  of  the  plane: 

?  =ax  +by  +c 

For  each  point  p,  in  the  region,  the  parameter  space  cells  are 
incremented  at  the  locations  corresponding  to  the  solutions 
(a  Jb  x  )  of  the  equation 

c  =z,  — ax,  —by,  be 

where  €j  represents  the  amount  of  error  of  fit  of  the  point 
(xr,  ,y,  x, )  to  the  plane  represented  by  (a  i  x  ).  The  array  is 
incremented  at  each  location  (a  Jb  x)  bv  the  amount  e,2  -  \t.e 
squared  error  of  fit  of  (.x,  ,v,  x, )  to  the  plane  (a  x  ).  Aftei  aii 
c>oints  have  been  entered  in  this  manner,  the  minimum  entry  in 
the  parameter  array  represents  the  solution  with  the  minimum 
squared  error. 

To  allow  for  mismatched  points  and  ambiguous  points,  a 
maximum  allowed  distance  of  a  point  from  the  plane  is  com¬ 
puted.  In  the  implementation,  twice  she  estimated  standard 
deviation  of  the  noise  was  used  as  the  maximum  distance.  If  a 
point  is  further  than  this  from  the  plane,  it  is  considered  to  be  an 
outlier  to  that  plane,  and  its  squared  error  does  not  contribute  to 
the  total.  In  the  case  of  ambiguous  points,  only  one  of  the  depth 
values  contributes  to  any  plane  -  the  one  which  is  closest  in 
depth.  An  important  advantage  of  the  Hough  transform  is  that 
it  requires  a  constant  amount  of  work  for  each  depth  value,  and 
the  amount  of  work  is  not  dependent  on  the  number  of  ambigu¬ 
ous  points  or  mismatches. 

A  disadvantage  of  the  Hough  transform  method  is  the  lim¬ 
ited  resolution  of  the  parameter  space.  In  the  implementation, 
the  parameter  space  was  11x11x16.  with  the  first  two  dimen¬ 
sions  used  for  the  x  and  y  slopes  from  -1.0  to  1.0.  and  the  third 
dimension  for  the  z  offset.  This  allowed  a  resolution  of  0.2  in 
••he  slope,  and  1.0  in  the  z  value.  A  higher  resolution  could  be 
used,  but  at  the  cost  of  additional  computation.  However, 
because  the  planes  obtained  are  only  local  approximations,  a 
very  fine  resolution  is  not  crucial. 


4.4.  CONFLICT  RESOLUTION 

Overlapping  planar  patches  obtained  from  the  previous  step 
may  be  inconsistent,  in  the  sense  that  a  depth  value  that  is  a 
good  fit  to  one  of  the  patches  is  treated  as  an  outlier  by  the  other 
patch.  This  situation  occurs,  for  example,  at  an  occluding  boun¬ 
dary.  where  a  patch  from  one  side  of  the  boundary  may  extend 
for  a  short  distance  into  the  other  side,  treating  points  from  the 
other  side  as  outliers.  These  patches  are  shrunk  until  they  no 
longer  contain  outliers.  If  an  image  region  has  multiple  candi¬ 
date  patches,  the  largest  of  the  shrunken  patches  is  selected  as 
the  best  fit  for  the  region.  If  there  is  no  one  patch  that  is  largest, 
the  patch  is  selected  that  is  most  consistent  with  its  neighbors,  or 
failing  that,  the  one  with  the  smallest  least  squared  error. 

If  an  outlier  point  is  really  a  mismatch,  it  should  be 
ignored  and  any  patch  which  contains  it  should  not  be  reduced. 
This  situation  can  usually  be  detected  because  the  planar  patches 
which  contain  the  outlier  point  will  generally  be  much  smaller 
than  the  ones  that  do  not.  due  to  the  fact  that  the  outlier  pioinl  is 
not  part  of  any  consistent  surface.  The  algorithm  handles  this 
situation  by  keeping  track  of  which  patches  are  maximal.  *.e.  not 
completely  contained  in  any  other  patch.  If  an  outlier  jxiint  is 
not  on  any  maximal  patch,  then  it  is  probably  a  mismatch  and 
can  be  ignored.  When  a  patch  is  reduced  in  size,  other  patches 
within  it  may  become  maximal.  Therefore,  this  process  is 
reflated  until  no  more  shrinking  is  necessary. 

The  result  is  a  set  of  planar  patches  which  are  no  longer 
inconsistent,  and  there  is  a  unique  patch  lor  every  region.  1  he 
ambiguous  depth  values  can  now  be  resolved  by  choosing  the 
match  which  is  on  the  largest  number  of  maximal  planes.  If 
none  of  the  depth  values  lor  a  point  are  on  any  maximal  plane. 


100 


pv»  .  ii  mismrtch  and  is  removed. 

1-5.  LOCATING  EDGES 

The  nexi  slep  in  ihe  algorithm  «s  lo  locale  depth  discon- 
inuiiies.  Such  discontinuilies  should  ideally  be  defined  by  adja- 
eni  pairs  of  palches  that  differ  in  depth.  However,  due  lo  the 
;parsily  of  l"e  inpui  data  points,  it  is  possible  that  there  will  be 
ilanar  palches  with  good  fits  across  discontinuities.  These 
latches  will  generally  be  small  compared  lo  the  palches  which 
lo  not  cross  the  discontinuity.  Therefore,  the  edge  can  be 
ieiected  by  examining  the  larger  patches  first,  and  eliminating 
hose  smaller  patches  which  cross  the  edge. 

The  program  checks  the  difference  in  depth,  at  the  midpoint 
>f  overlap,  for  all  overlapping  or  adjacent  patches.  If  this 
inference  is  greater  than  a  threshold2  then  an  edge  point  is 
narked  there.  The  edge  is  grown  for  a  short  distance  perpendic- 
ilar  to  the  line  joining  the  centers  of  the  overlapping  patches. 
;uch  that  its  extension  does  not  contradict  the  depth  points  in  the 
leighborhood  —  meaning  that  the  edge  consistently  separates 
ugh  depth  values  on  one  side  and  low  depth  values  on  the  oth^r. 
\ny  smaller  patches  which  overlap  this  edge  band  are  shrunk  so 
hat  they  do  not  contain  the  band. 

The  result  is  a  set  of  possible  edge  points,  which  indicate 
vhere  the  occluding  contours  could  lie.  The  edge  points  are 
hinned  until  a  one-pixel  wide  contour  is  obtained.  Any  palches 
vhich  overlap  this  contour  are  shrunk. 

After  the  occluding  contours  are  found,  the  orientation 
lisccnlinuities  could  then  be  found  by  the  same  methods  -  i  e.. 
rhecking  overlapping  or  adjacent  palches  for  significant  onenta- 
ion  differences  at  the  midpoint  of  overlap.  This  sep  is  not 
mplem^nted  in  the  current  version,  but  wi*l  be  in  future  ver¬ 
sions. 


1.6.  GENERATING  A  COMPLETE  SURFACE  MAP 

The  final  step  is  to  interpolate  lo  (  blain  a  complete  surface. 
Th?  algorithm  of  Terzopoulos  (TerzH3.  was  used.  The  input  lo 
his  algorithm  is  the  unambiguous  depth  points  and  the 
•slimaled  depth  discontinuities.  We  also  provided  an  initial  sur- 
ace  estimate,  which  was  obtained  by  just  averaging  the  heights 
>f  the  planar  patches.  The  algorithm  iteratively  smoothed  the 
surface  while  requiring  it  lo  pass  close  lo  the  depth  points. 

After  iterating  until  the  amount  of  change  was  very  small, 
he  surface  was  analyzed  for  evidence  of  depth  discontinuities 
hat  were  noi  detected  earlier.  If  the  surlac-  gradient  is  high, 
hen  there  probably  should  be  a  depth  discontinuity  there 
1‘hese  were  located  by  taking  the  i.aplacian  of  the  surface,  and 
luting  the  zero  crossings.  At  zero  crossings  where  (he  surface 
gradient  exceeded  a  threshold,  a  new  edge  point  a.in  marked. 
The  Ter/opoulos  algorithm  was  again  run  with  i he  new  edges 
jnlil  change  in  the  surface  was  small  The  result  is  i lie  final 
•st i mate  of  the  surface  at  the  current  level. 

This  surface  is  now  used  as  the  depth  estimate  lor  matching 
it  the  next  level.  In  me  vicinity  of  occluding  contours,  two 
ieplh  estimates  are  used  -  one  Irom  the  high  side  of  the  surface, 
ind  one  from  the  low  side.  This  is  done  because  the  exact  *oca- 
lon  of  the  occluding  contour  is  uncertain. 


the  cameras,  the  spherical  baseball  appears  to  be  an  ellipsoid. 
Fach  image  is  256x256  pixels  in  size.  The  images  were 
separately  convolved  with  the  Marr-llildreth  operator  at 
different  scales,  and  zero  crossings  extracted.  Figure  4  shows  the 
zero  crossings  at  three  scales  of  resolution:  256x256.  l2Kx!2S. 
and  64x64  3  he  effective  w  idlh  of  the  operator  is  6  for  each 

case. 


5.1.  RESULTS  FROM  BASEBALL  IMAGES 

A  constant  depth  estimate  of  3  was  supplied  to  the  algo¬ 
rithm  to  match  the  64x64  level.  The  estimate  corresponds  to  a 
depth  midway  between  the  lop  of  the  ball  and  the  newspaper1 
The  initial  matching  results  for  this  level  are  summarized  in 
Table  1 .  as  are  the  matching  results  for  the  other  levels. 

Table  1 

Matching  Summary 


Size 

64 

1 2ft 

256 

0  matches 

32 

230 

1  799 

1  match 

221 

1328 

516“ 

2  matches 

21 

269 

798 

>  2  matches 

4 

19 

87 

Total  non-hon/onial  zero  v rossings 

278 

1(146 

7853 

Planar  palches  were  fit  to  the  depth  points  obtained  from 
the  aliove  matching  process  zs  described  earlier.  The  ambiguous 
depth  points  were  resolved  and  mismatched  points  were 
removed.  The  algorithm  then,  searched  for  depth  edges  by  com¬ 
paring  the  heights  of  adjacent  and  overlapping  patches.  \o  depth 
edges  were  found  at  this  level. 

The  Terzopoulos  surface  interpolation  algorithm  was  then 
run.  using  the  unambiguous  depth  points  as  depth  constraints. 
After  stabilization,  surface  points  of  high  gradient  were  found 
and  marked  as  depth  edges.  These  nt\»  depth  edges  are  shown  in 
Figure  5  Surface  interpolation  was  don  gain,  but  smoothing 
was  not  done  across  the  new  depth  edges.  'The  final  surface  for 
this  level  is  shown  in  Figure  6.  A  con*.ou«*  plot  is  shown  in  Fig¬ 
ure  7. 

The  entire  process  was  repealed  at  the  12ftxl2S  level,  but 
using  the  surface  obtained  at  the  64x64  level  as  a  disparity  esti¬ 
mate  lor  matching.  The  12ftx12ft  level  is  different  from  the 
64x64  level  since  edges  were  delected  by  comparing  adjacent 
planar  patches  Alter  the  Terzopoulos  algorithm  was  run.  these 
edges  were  modified  and  extended.  Figure  fi  shows  the  final 
depth  edges  obtained  by  the  algorithm  for  this  level.  Figure  9 
shows  the  surface  as  a  height  field,  and  Figure  10  shows  the  sur¬ 
face  as  a  contour  plot.  Note  that  the  height  oi  the  surface  is 
double  that  of  the  64x64  level.  This  is  because  the  disparities 
have  doubled,  due  to  the  increased  resolution. 

The  process  was  again  repeal'd  at  the  256x256  level,  using 
the  surlace  obtained  at  the  12Kx  I2K  !e^el  as  a  disparity  estimate. 
Figure  1 1  shows  the  final  depth  edges  obtained  by  the  algorithm 
for  this  level.  Figure  12  shows  the  surface  as  a  height  field,  and 
Figure  13  shows  the  surface  as  a  contour  plot. 


5.  EXAMPLES 

Results  are  presented  for  running  the  algorill.,n  on  the 
dereo  pair  of  images  shown  in  Figure  3.  These  images  show  a 
'wsebull  resting  on  a  newspaper.  Because  of  the  aspect  ratio  of 

JThe  threshold  in  the  implementation  taken  to  he  3.5 <T  \  . 


5-2.  COMPARISON  W  ITH  GROUND  TRUTH 

The  images  were  taken  u*;ng  cameras  looking  down  from 
the  height  ol  approximately  40"  The  cameras  were  separated  by 
a  distance  of  approximately  12“.  in  an  epipolar  configuration. 

■  \  *  ;Hr  ol  Sot  the  estimate  mill  because  the  "idth  o! 

the  matching  *  mM.'*  is  mmh  ;«rurr  than  the  rengr  ol  unties  lor  this  scene. 


101 


m-  . 


The  baseball  has  a  diameter  of  aboul  Although  these  parame¬ 
ters  were  not  measured  accurately,  we  measured  the  disparity  ai 
certain  image  points  to  obtain  the  ground  truth.  We  located  dis¬ 
tinctive  markings  in  both  images  and  noted  the  difference  in  their 
positions.  The  disparities  measured  in  this  way  were  found  to  be 
roughly  constant  at  aboul  H  for  the  newspaper  and  about  20  at 
the  top  of  the  baseball'.  The  literal  dimensions  (in  pixels)  of  the 
baseball  image  were  also  measured  by  hand.  The  true  surface 
man  was  calculated  by  j.,ing  the  equation  of  an  eli  psoid  for  the 
ball  ana  a  plane  for  the  newspaper. 

The  calculated  disparities  at  zero  crossings  were  compared 
to  the  ideal  disparities  at  the  same  locations.  The  comparison  is 
shown  in  Table  2.  Figures  for  each  level  in  Table  2a  are  for  the 
initial  matches  for  that  level,  obtained  by  a  search  in  the  vicinity 
of  the  positions  predicted  by  the  coarser  level  (or  constant  if  ini¬ 
tial)  surface  estimate.  If  a  zero  crossing  match  is  ambiguous,  the 
closest  depth  value  to  the  ideal  was  taken  The  data  shows  that 
the  assumption  made  about  the  magnitude  of  the  disparity  noise 
was  reasonable.  Since  the  width  of  the  filler  for  all  three  levels 
w  as  6. 


is  the  standard  deviation  of  the  Gaussian,  and  we  have  assumed 
that  the  standard  deviation  of  the  noise  was 

cr  \  =(  1  /2  )•  ;r0  -1.06 

In  each  of  the  cases,  about  95“^  of  the  error;  were  less  than  2 
pixels,  which  is  consistent  with  the  value  of  <7V .  The  data  also 
shows  that  rough  I  >  57<  of  the  points  have  large  errors  and 
should  be  treated  as  mismatches 

Figures  for  each  level  in  Table  2b  are  for  the  zero  crossings 
remaining  after  resolving  ambiguities  and  removing  mismatches. 
The  data  shows  that  most  of  the  zero  crossings  that  were 
removed  were  points  that  had  large  errors,.  Most  of  the  remain¬ 
ing  points  with  large  errors  lie  near  the  contour  of  the  ball.  Fig¬ 
ure  14  shows  the  locations  of  the  remaining  points  with  3  or 
more  pixels  of  disparity  erro”.  for  the  256x256  level. 

The  ideal  surface  for  the  256x256  level  is  shown  as  a 
height  field  in  Figure  15.  and  a  contour  map  in  Figure  16.  The 
ideal  surface  agrees  with  the  surface  obtained  to  within  I  pixel 
in  most  |  *  aces. 

5.3.  COMMENTS  ON  PERFORMANCE 

The  deilh  edges  detected  by  the  algorithm  are  occasionally 
missing  or  misplaced.  In  the  future,  we  plan  to  incorporate  a  3D 
edge  smoothness  constraint  so  that  they  form  smooth  contours. 

The  surface  interpolation  step  using  the  Terz.opoulos  algo¬ 
rithm  is  probably  unnecessary,  because  the  surface  information 
is  already  available  locally  in  the  form  of  the  planar  patches.  In 
I  act.  a  good  approximation  to  the  final  result  is  obtained  by  just 
averaging  the  heighls  of  the  overlapping  patches  at  each  point, 
and  this  approximation  is  used  as  the  starling  surface  for  the 
Ter/opoulos  algorithm  However,  a  more  sophisticated  method 
ol  combining  the  overlapping  patches  appears  necessary  to  obtain 
the  final  surface,  because  the  surl  ace  obtained  by  just  averaging 
looks  block y. 

The  algorithm  occasionally  has  trouble  identifying 
mismatched  points  near  the  border  of  the  image.  This  happens 
whenever  the  error  points  do  not  have  enough  correct  points 
nearby  Any  error  points  near  the  image  border  are  less  likely  to 
lie  corrected  since  they  have  I  ewer  surrounding  points  and  hence 
'ewer  surrounding  correct  points.  Therefore,  mismatched  points 

^  \ctu*Uv.  the  rli>|nnty  of  the  nrw\p«prr  me*  gradually  from  « bout  7.5  it 
The  lop  ol  ihe  image  to  about  9.5  «t  the  bottom.  The  gradual  me  ran  *r  observed 
in  the  reconstructed  surl  are.  particularly  tX  the  2.56x236  level. 


are  more  likely  io  survive  near  ine  image  oorder  man  away 
from  the  border.  This  is  observable  at  the  right  border  of  the 
256x256  surface,  where  a  few  points  with  incorrect  depth  values 
cause  the  surface  to  rise  sharply.  This  problem  will  be  addiessed 
in  the  future 


Table  2 

Accuracy  of  Disparities  at  Zero  Crossings 
Table  2a:  Initial  matches  suggested  by  coarser  level. 


64 

12S 

256 

error 

0-  1 

211  (8770 

1438  (907.) 

4889  !82lO 

1  -  2 

20  (  870 

•>l  (  670 

643  (11%) 

2  -  3 

1  ( 0% 

7  (  07!) 

67  (  17.  ) 

3  -  4 

5  (  2 70 

2  (  070 

86  (  1%  ) 

>  4 

5  (  271 ) 

49  (  37) 

282  (  570 

Totals 

242  ( 1007. ) 

1597 (1007O 

5967  ( 100%) 

o Table  2b:  Final  matches  after  surface-based  processing. 


64 

128 

256 

error 

0-  1 

210(89%) 

1431  (93%) 

4834 (8770 

1  -  2 

21  (  970 

101  (  77!) 

631  01*0 

2  -  3 

1  (  0%) 

6  (  0%) 

51  (  170 

3  -  4 

5  (  270 

1  (  0%) 

15  (  0%) 

>  4 

0(  07.) 

2(0%) 

40  (  1%) 

Total 

237  (10070 

1541  ( 1007! ) 

5571  ( l(v1% ) 

6.  CONCLUSIONS 

The  following  are  some  salient  features  of  our  approach  to 
stereo. 

Integra  fu»n  of  A late  tun g  and  Interpolation 

Tie  most  novel  characteristic  of  our  approach  is  the  use  of 
the  surface  smoothness  criterion  for  stereo  matching,  and  thus  an 
integration  of  matching  and  surface  .nterpolation  operations. 
This  is  in  contrast  with  the  traditional  sequential  ordering  of 
matching  jollned  by  interpolation.  The  control  passes  back  and 
forth  hetw-'.n  matching  and  interpolation  processes,  each 
depending  an  the  result  of  the  other  to  make  progress,  and  gen¬ 
erating  i  progressively  refined  set  of  depth  maps  of  a  scene  at 
increasing  degree  of  resolution.  A  given  coarse  level  surface 
predicts  the  locations  of  edge  matches  at  the  next  finer  level. 
The  matched  features  at  the  finer  level  provide  a  more  refined 
surface  which  in  turn  predicts  pairs  of  edges  to  be  matched  at 
the  next  finer  level  of  resolution. 

Occluding  and  Ridge  Contour  j 

Another  important  charac. eristic  ol  our  approach  is  that 
our  smoothness  constraint  explicitly  incorporates  the  existence  of 
depth  and  orientation  discontinuities  in  the  computation.  It  is 
fairly  domain-independent,  ie  .  it  uses  no  constraint  other  than 
the  assumption  that  objects  in  the  real  v'orld  lend  to  have 
smooth  surfaces,  i.e..  the  depth  varies  gradually  except  across 
relatively  rare,  occluding  and  ridge  contours.  These  contours  are 
conuantly  detected  and  a  •  mooth  surface  is  interpolated  allowing 
depth/siope  discontinuity  across  the  contours,  thus  implement- 


102 


g  th?  pieces?  ise-smiX^b 

rrent  implemenialion. 


mode!  oi  real  w 
we  delect  only 


— i j  - 1 _ . ..  i -  ..  _ 

ui  iu  vjwjcvi.'t.  hi  uui 

occluding  contours. 


ns  causes  no  problems  with  the  baseball  image.  We  are 


rrenl ly  incorporating  the  detection  of  ridge  contours  in  our 
gorithm. 


There  is  an  important  posit-  •»  side  ?<Tecl  of  explicit  delec- 
m  of  contours.  We  can  identity  the  region:-  corresponding  to 
e  parts  of  the  scene  not  visible  from  eacn  camera.  In  case  of  the 
seball  image  these  are  the  regions  immediately  to  the  left  of 
e  left  occluding  border,  and  to  the  right  of  the  right  occluding 
•rder.  The  mismatches  in  the  these  regions  can  now  be  avoided, 
e  plan  to  incorporate  this  feature  in  our  Algorithm. 


truinuity  of  Discontinuities 

finally  our  approach  enforces  smoothness  and  continuity 
J-I)  occluding  and  ridge  contours.  This  constraint  is  based 
x>n  the  assumption  that  real  world  objects  have  .-.orfaces  that 
ve  smooth  b^-ders.  We  have  not  incorporated  such  31) 
loothing  in  our  current  implementation.  Nor  have  we  done 
iborate  2D  extension  of  edge  segments  detected  from  a  accent 
tches.  These  are  other  a*1  lilions  we  plan  to  make  to  our 
rreni  implementation. 

Our  current  imp1  .'mentation  of  plane  filling  to  depth 
>ints.  and  edge  detection  from  the  resulting  patches  is  prelim- 
arv  and  need:,  refinement.  The  poor  performance  of  these  steps 
suits  in  the  errors  observed  near  image  borders,  near  the  base- 
ill  borders,  and  the  poor  quality  of  contours  We  expect  to 
iprove  this  performance.  Once  we  have  more  accurate  patches, 
e  plan  in  perform  the  surface  interpolation  by  combining  the 
irious  planar  patches  at  each  point,  ant1  not  use  the  global 
lerpolation  performed  by  the  Terzopoulos  algorithm. 

One  final  comment  before  we  close.  The  baseball  image 
ive  we  have  chosen  to  lun  our  algorithm  on  presents  a  particu- 
rlv  harsh  lest  of  the  algorithm  near  the  baseball  border  I  bis  is 
■cause  large  surface  steepness  and  occluding  contours  occur  at 
ie  same  locations.  This  make-,  the  need  for  betlei  planar  patch 
lection  and  edge  detection  even  more  important. 


References 


Arm  ’*) 

Arnold.  I)..  Local  context  in  matching  edges  for  stereo 
vision."  Proc  Image  Understanding  Work  simp,  Mav  197*. 

(lake*  I  ] 

Baker.  If.*  1..  Depth  from  F.dge  and  Intensity  based  Stereo. 
Ph.D.  Thesis.  University  nl  Illinois  at  Lrbana-Champaign. 
19*1. 

$arn*0] 

Barnard.  S.T.  and  W.B.  Thompson.  "Disparity  analysis  nl 
images."  IF.lili  Trans  Pattern  Anal  Machine  Iniell ..  July 
1 9*0 

ftarn*2] 

Barnard.  S.T.  and  M.  A.  I  ischler.  "Computation.:.  Stereo." 
Computing  Surveys,  vol.  14.  no.  4.  I>;ember  19*2. 

Ber/*4] 

Berzins.  Yalciis.  "Accuracy  of  l.aplacian  Lclge  Detectors." 
CV<;//’  vol.  27.  pp  195-210.  *9*4. 

Hen  n77] 

Cennery.  I)..  "A  stereo  .  isioii  system  tor  an  autonomous 
vehicle."  1JCPR.  1977.  p  576. 

Ciri.v.dl  ] 

(irimson.  L.  From  Images  to  Surfaces .  MIT  Press,  ('.mt* 
bridge.  Massachusetts.  19*1. 


[iiannsuj 

Hannah.  M.J..  "Bootstrap  Stereo."  /Voc.  Image  Fnder stand¬ 
ing  Workshop.  College  Park.  Md.  April  19*0. 

[nc»'.d79| 

Henderson,  R.L..  et  al.  "Automatic  stereo  reconstruction  of 
man-made  targets  "  S  <  PJJi  .  vol.  1*6.  no  6.  1979. 

{ 1  lotT  *5] 

Ho<r  W.A..  and  \.  Ahuja.  "Depth  from  Stereo."  Fourtf 
Scandinavian  Conference  on  Image  Analysis.  June  l*-20 
19*5.  Trondheim.  Norway.  761-76*. 

[MayhM  ] 

May  hew.  J  I  W..  and  J.P.  I’risby.  "Psychophysical  and 
Computational  Studies  towards  a  Theory  of  Human 
St  ere  ops  is."  Artificial  Intelligence  IT.  1981.  pp.  349-3*5. 
[Mora*!  j 

Moravec.  H..  "Rover  visual  obstacle  avoidance."  Prise.  ?th 
UCAl .  August  19*1.  pp  7*5-790. 

[Pant7*l 

Panton.  D.J..  "A  flexible  approach  to  digital  stereo  map¬ 
ping."  Pluftogramm.  ling.  Remote  Sensing  44.  12,  Dec  197*. 
pp  1499-1512. 

[Terz*3j 

Teizopoulns.  Demelri.  “The  tole  ol  constraints  and  discon¬ 
tinuities  in  visible-surface  reconstruction.*  Pn*c.  IJCAI .  pp. 
1073-1077.  Karlsruhe.  August  19*3. 


x  y- 


Surface  predwu-d  by  ibc 
smoot  hnrsso/ -dn  partly  con- 
tlramt 


Surfa:  ^  actually  pev- 
ctivfd  bv  human 
v  tewrrs 


figure  1. 

An  experiment  with  numan  steieo  vision  to  contrast  our  surface 
smoothness  constraint  lor  stereo  matching  against  the  traditional 
.onMunl- local-disparity  constraint.  An  ambiguous  random  dot 
stereogram  is  shown  at  the  top  aril  the  two  surfaces  predicted  by 
the  constraints  are  shown  below.  Most  observers  see  the  bottom 


103 


surface 


Hitt  ui  cr*<<tclrt  (carats*) 


Figure  2. 

A  schematic  view  or  control  flow  and  computation  in  our  pro 
posed  stereo  algorithm. 


yf- 

'  -'C* 


mmMmrnm 

Ij.i 


L'  ViV  ..  '  V  • 


Figure  4. 

Zero  crossings  for  the  64*64.  12**126.  and  256*256  levels  of 
resolution. 


Figure  3.  Original  stereo  pair  of  images. 


104 


»  m  * 


SECTION  III 


TECHNICAL  REPORTS  PRESENTED 


K 


STRUCTURE  FROM  MOTION  WITHOUT  CORRESPONDENCE:  GENERAL  PRINCIPLE 


Ken-ichi  Kanatani* 


Center  for  Automation  Research 
University  of  Maryland 
College  Park,  MD  20742 


ABSTRACT 

A  genera]  principle  is  given  for  detecting  3D  structure  and 
motion  from  an  image  sequence  without  using  point-to-point 
correspondence.  The  procedure  consists  of  two  stages:  (i)  determina¬ 
tion  of  the  flow  parameters,  which  completely  characterize  the 
motion  of  the  planar  part  of  the  object,  and  (ii)  computation  of  3D 
recovery  from  these  flow  parameters.  The  first  stage  is  done  by 
measuring  features  of  the  image  sequence.  The  second  stage  is 
analytically  expressed  in  terms  of  invariants  with  respect  to  coo-di- 
nate  changes.  Typical  features  and  relations  to  stepwise  tracing  are 
also  discussed. 


1.  INTRODUCTION 

Recovery  of  3D  structure  and  motion  from  a  2D  image 
sequence  is  one  of  the  most  challenging  problems  in  computer 
vision.  Most  existing  schemes  are  classified  into  two  types 
One  is  the  corresponde&e-based  approach,  which  does  not 
assume  any  particular  model  of  the  object  except  the  rigidity 
of  motion  and  uses  point-to-point  correspondence  explicitly. 
The  3D  structure  and  motion  are  recovered  numerically  [1- 
5],  Another  is  the  flow-bated  approa-h,  which  emplo,  s  a 
specific  model  of  the  object  and  pays  attention  to  global 
characteristics  of  the  optical  flow  such  as  vanishing  points  [6- 
9],  This  idea  is  fully  developed  by  Kana'ani  [10-12];  if  the 
object  is  a  plane,  the  3D  structure  and  motion  are  given 
analytically  in  terms  of  invariants  with  respect  to  coordinate 
changes  on  the  image  plane.  These  invariants  are  derived  by 
means  of  irreducible  reduction  cf  the  2D  rotation  group. 

Although  the  flow-based  approach  does  not  make  use  of 
point-topoint  correspondence  explicitly,  the  optical  flow  itself 
is  usually  obtained  by  detecting  the  point-to-point 
correspondence  between  two  successive  images,  and  this 
correspondence  detection  is  a  time  consuming  process  (13-17). 
Kanatani  [18-20|  proposed  schemes  which  do  not  use  the 
correspondence  when  the  object  is  a  planar  surface.  In  this 
paper,  we  first  summarize  the  analytical  results  of  Kanatani 
[10-12]  and  then  generalize  Kanatani’s  schemes  118-20:  so  that 
those  analytical  results  can  fit  in  the  present  new  setting. 


2.  3D  MOTION  FROM  FLOW  PARAMETERS 

We  assume  that  the  image  under  consideration  is  decom¬ 
posed  into  planar  or  almost  planar  regions,  say  by  the  method 


•  Permanent  address:  Department  of  Compute.  Science  Gunma 
University,  Kiryu,  Gunma  376,  Japan. 


discussed  by  Kanatani  [10,11].  Now,  attention  is  paid  to  each 
region  regarded  as  planar.  Take  a  Cartesian  zy-eoordinate 
system  on  the  image  plane  and  the  z-axis  perpendicular  to  it. 
Let  z=pz+qy+r  be  the  equation  of  that  plane  The 
coefficients  p  and  q  are  the  components  of  the  gradient  of  the 
plane,  and  r  represents  the  absolute  depth  from  the  image 
plane.  Let  (0,0, r),  the  intersection  between  the  plane  and  the 
z-axis,  be  a  reference  point  (Fig.  1).  The  instantaneous  rigid 
motion  is  specified  by  translation  velocity  ( a,b,c )  at  the  refer¬ 
ence  point  and  rotation  velocity  (uq.oq.wi])  screwwise  around  it 
(i.e. ,  with  rotation  axis  orientation  (uq.wj.uq)  and  angular  velo¬ 
city  (rad/sec)  screwwise  around  it).  Hence,  our 

target  is  to  reconstruct  the  nine  structure  and  motion  parame¬ 
ters  p,  q,  r,  a,  b,  c,  uq,  tuj  and  cuj  from  observation  of  the  pro¬ 
jected  image  motion. 

(1)  PERSPECTIVE  PROJECTION 

Let  (0,0, -J),  the  point  on  the  a-axis  at  distance  /  from  the 
image  plane  on  the  negative  side,  be  the  viewpoint  or  focus  of 
the  camera.  A  point  (X,  Y,Z)  in  the  sce^o  is  projected  to 
[fX/(f+  Z),fY/(f+Zl)  on  the  image  plant.  If  the  point  is  on  the 
plane  z=pz+qy+r  which  is  moving  as  described  above,  it  is 
easy  to  snow  that  the  following  optical  flow  is  induced  at 
point  (z, y)  on  the  image  plane: 

u=  Up  t-Az+By+{Ez+Fy)z, 

(2.1) 

ir=v0+Cz+Oy+(Ez+Fy)y, 
where  eight  flow  parameters  are  given  by 


t  pun  r,  aa 

A  1  J>-’-  -777-  •  fl=  ^2-^3-  T— 

(2-2) 

C=-p uq^-Wj-y^-,  D=-<Vx  +  2~-. 

F= 

In  other  words,  what  we  are  viewing  is  a  very  restricted  form 
of  motion  whose  velocities  are  specified  c-cly  by  eight  flow 
parameters  uo,  e0,  A,  B,  C,  D,  E  and  F.  If  these  parameters 
are  the  same,  motions  seem  identical  to  the  viewer  Thus,  our 
procedure  is  divided  into  two  stages.  First,  we  detect  the  flow 
parameters  u,,  t0,  A,  B,  C,  D,  E  and  F  from  a  given  image 
sequence.  Next,  we  compute  the  structure  and  motion  param¬ 
eters  p.  q,  r,  a,  6.  c,  uq,  o'-  and  from  these  flow  parameters. 


107 


-  w,  <r 


\ 


j 


i 


/ 


The  second  stage  is  performed  by  solving  the  non-linear  simul¬ 
taneous  equations  (2.2)  as  follows  (Appendix  A):  First,  com¬ 
pute 

C/q—  Ug+iqj  T=A+D,  R—C-B, 

(2.3) 

S=(A-D)+iB+q ,  K=E+iF, 

where  i  is  the  imaginary  unit.  Hence,  U9,  K  and  S  are  com¬ 
plex  numbers  If  we  put  V,=«+it,  P=p+iq  and  W=ui ,+soj, 
then  y,  e,  P  and  w3  are  given  by 

V**{f+r)U9/f,  cMJ+c)J, 

p[S)= vjfrs/ijK-  vjtf-K  es\, 

Wt  /) = -^JK-  'JoJp:  s/UK-Voltf-US+x  U^f,  (2.4) 

ws(0=y(«+R*[^‘/X  WtO*+'W/)l). 

C'=--1<  r+Im[P(cO(  W[q'+iUllJ)% 

where  Re[.|  and  Imj.j  denote  the  real  and  the  imaginary  part 
respectively  and  *  the  complex  conjugate.  Here,  P,  W  and 
are  functions  of  P,  and  P  is  given  by  solving  the  last  eqn* 

(2.4) .  There  exists  only  one  non-xero  solution  p.  In  fact,  if 
we  substitute  the  expressions  for  P[P)  and  W^c7)  in  it,  the 
equation  reduces  to  a  cubic  equation  in  P  (Appendix  A). 
Since  an  explicit  form  of  the  solution  ot  a  cubic  equation 
exists,  we  can  express  the  solution  P  explicitly,  although  in  in 
a  complicated  form,  if  we  wish.  However,  application  of  an 
iteration  scheme  seems  more  feasible.  In  any  case,  the  prob¬ 
lem  is  completely  solved  analytically,  and  we  find  that  (il  the 
absolute  depth  r  is  indeterminate,  (ii)  «/(/+r),  b/{J+r)  and 
<-/(/+ r)  are  uniquely  determined,  and  (iii)  there  exist  two  sets 
of  solutions  rc'  p,  f,  u, ,  and  uij,  one  being  true  and  the 
other  spurious  However,  the  spurious  solution  disappears  if 
two  or  more  planar  regions  of  the  same  object  are  observed 
because  ut,  and  w3  must  be  common  to  them.  Numerical 
schemes  of  3D  recovery  from  point-to-point  correspondence 
have  been  knewn  [2-4]  and  the  existence  of  the  spurous  solu¬ 
tion  was  pointed  out  |9|,  but  analytical  expressions  like  eqns 

(2.4)  have  not  been  known. 

(2)  ORTHOGRAPHIC  APPROXIMATION 

If  we  take  the  limit  /— oo  of  a  large  focal  length  /  in  eqns 
(2.2),  we  obtain  the  following  orthographic  approximation : 

Uo=a,  Mc=t, 

A=pw3,  C=»-yW(+Wj,  D=-q*>  |,  (2.5) 

£= 0,  F= 0, 

and  the  solutiop  is  exp'icitly  given  as  follows  (Appendix  B): 
V=Ua,  wj=i(«±i/sF?), 

/>=|exp^i-Iarg(5)+iarg(2uiJ-<R+.T))),  (2.6) 

W'=kexp^y+yarg(S)— i-arg(2wj-(ff+iT))), 

where  arg  denotes  the  argument.  Here,  k  is  an  indeterminate 
scale  factor.  Thus,  (i)  the  absolute  depth  r  and  the  velocity  e 


Fig.  1.  A  plane  of  equation  e-y«l  qp+r  is  moving  with  translation 
velocity  (s,i,c)  at  (0,0, r)  and  rotatioo  velocity  (in,  ,!/*>>»)  around  it. 
An  optical  Bow  is  induced  oo  tht  ry- plane  by  perspective  projectioa, 
(0,0,-/)  being  the  viewpoint. 


in  the  >direction  are  indeterminate,  (ii)  an  indeterminate  scale 
factor  k  is  involved,  and  (iii)  there  exist  two  types  of  solutions, 
one  being  true  and  the  otht.  spurious.  However,  the  spurious 
solution  d  trap  pears  if  two  or  more  planar  regions  of  the  same 
object  are  observed  because  <Jj  and  w>j  must  be  common  to 
them.  3D  recovery  from  point-to-point  correspondence  under 
orthographic  projection  was  first  studied  by  UUman  [2l,  and 
the  fact  that  an  indeterminate  scale  factor  is  necmrvily 
involved  was  already  pointed  out  [5|.  However,  ansi;  ical 
expressions  of  the  solution  have  not  been  knows. 

(3)  PSEUDO-ORTH OCRAPHIC  APPROXIMATION 

If  we  omit  term*  of  0(1//)  but  retain  terms  of  0(1//)  is 
eqns  (2  2),  E  and  F  are  replaced  by 

E~w~Jt,  F--w  i/'  (2.7) 

respectively,  which  we  call  the  psendo~orthographie 
approximation.  The  solution  is  an alyt ,cally  given  as  follows 
(Appendix  C): 

V—{J+r)UoJ !,  W=,fK.  P-fK-VJf 

lJs=.l(R-l-Im[Se-,-l),  c=-^tl(r-Re[Se-J“l),  (2.8) 
osnarg(//f- £/,//). 

Hence,  (i)  the  absolute  depth  r  is  indeterminate,  (ii)  o/(/+r), 
b/([+r)  and  c/(J+r )  are  uniquely  determined,  and  (iii)  p,  q,  V|, 
U]  and  uij  are  uniquely  determined.  It  should  be  noted  that 
no  ipuriovi  solution  exists 

Tbe  parameters  of  eqns  (2.3)  have  geometrical  meanings 
[10,  11]:  U0  translation,  T  divergence,  R  rotation,  S  shearing 
and  K  fanning  (Fig.  2).  They  are  transformed  by  a  coordinate 
rotation  by  9  on  the  image  plane  as 

T  -  r,  R  —  R, 

U0  -  tte-*,  K  -*  Kt*,  (2.9) 

5  -  Seu. 


•v*. 

<\ 

r.-*. 

V 


108 


(Sc*  Appendix  D.)  In  other  words,  T  and  R  (is  well  is  r,  t 
and  u3)  are  (absolute)  invariants  of  weight  0  (or  scalars),  L'a 
and  K  (as  well  as  V,  P  and  H)  aie  (relative)  invariants  of 

weight  -I  (or  vectors),  and  3  Li  a  (re’ative)  invariant  of  weight 

-2  (or  a  (ensor)[l2j. 


3.  FLOW  PARAMETER  ESTIMATION  BY  FEATURES 
Let  .Y(i,y)  represent  the  image  For  example,  if  the 
image  consists  of  gray-Iev»la.  AT i ;')  denotes  its  intensity  at 
point  (i, ,’).  If  the  image  consists  of  colors,  A(T,y)  may  be  a 
vector  valued  function  corresponding  to  R,  G  and  B.  If  the 
image  consists  of  points  and  lines,  AT.r.y)  has  delta-function¬ 
like  singularities.  In  any  ra.i»,  we  define  a  feature  of  image 
Jffr.y)  as  a  functional,  i.e.,  •  map  f] ,j  from  the  set  of  images 
ATx.y)  to  real  numbers. 

Suppose  that  there  is  an  optical  ficw  ulx, y),  ifa-y)  on  the 
image  plane  and  that  the  image  is  moving  according  to  this 
flow  Then,  if  A{x,y)  is  an  image  at  rime  I.  it  changes  at  time 
t-rbt  after  a  short  time  interval  into 

A(j-i4;r,y)£t,y-t<x,y)<5f) 

=.Y[x,y)--^ii(i,y)6/--|p-i{i,y)6l+  .  (3.1) 

ox  0  y 

Then,  a  feature  F)AT  at  time  I  changes  at  t+bt  into 
F;A'|-t-Df]AlAl-t-  •  •  •  ,  and  the  change  rate  Df].j  is  in  general  a 
Unear  functional  in  (r.y)  and  t^r.y). 

In  view  of  the  optical  flow  of  eqns  (2  1),  this  means  that 
we  hav..  a  linear  equation  of  the  form 

DFJA1  =  C,[A]  hj+  C2|  .V]  t0+  +C7|A]£+C„A1f,  {3  2) 

where  C|[.j,  ....  Cj[.)  are  functionals  derived  from  the  given 
feature  functional  F|.|,  so  that  .hey  are  all  known  functionals 
On  the  other  hand,  the  change  rate  D/"[!  of  feature  F  '  can 
be  estimated  by  difference  schemes.  For  example,  observe  the 
i.  iage  at  time  t  and  compute  feature  /]<).  Next,  observe  the 
image  at  time  t-^bt  after  a  short  time  interval  and  compute 
the  same  feature  F[t+bt).  Then,  the  time  change  DF1.V  is 
approximated  by  (F[t-*-bt)-F[t))/6t,  or  we  can  use  a  higher 
order  numerical  differentiation  scheme  if  observations  are 
made  on  three  or  more  consecutive  images  Thus,  all  quanti¬ 
ties  except  i^),  t«o,  A,  B,  C,  t),  E  and  fin  eqn  (3.2)  are  directly 
commuted  from  an  image  sequence  without  requiring  point-to- 
poin'  correspondence.  Since  an  equation  of  the  form  of  eqn 
(3.2)  provides  a  linear  constraint,  we  obtain  a  set  of  nmul- 
tai  “ous  linear  equelions  to  solve  for  the  flow  parameters  uq, 
in,  ..,  E  and  F  if  we  provide  eight  or  more  independent 
feature  functionals  Fij  i,  F. 1-!.  ••• 

The  idea  of  using  feature  functionals  was  suggested  by 
A-nari  [21 ,221  and  was  applied  to  3D  recovery  by  Kanatam 
118-201.  However,  he  did  not  divide  the  computation  process 
into  iwo  stages  as  described  here  but  tried  to  compute  the 
structure  and  motion  parameters  p,  q,  r,  a,  o,  c,  *jt,  and  »ij 
directly  This  leads  to  a  set  of  'imultaneous  non-lmear  equa¬ 
tions  which  are  difficult  to  solve.  He  proposed  an  iterative 
scheme  which  traces  the  motion  along  time,  starting  from 
L.io.  n  i .. 1 1 i al  values  of  p,  q  and  r  as  desor  bed  later  Here, 
ii  vever  he  process  is  divided  into  two  stages.  We  first  esti- 
mai  he  f;ou>  parameters  by  ralving  a  r-t  of  linear  equations 
This  no  computational  problem.  Then,  the  structure 

and  motion  parameters  are  computed  in  analytical  terms  as 
described  in  the  previous  section. 


Fig.  2.  (a)  Translation  by  (b)  Divergence  by  T.  (c)  Rotar 

lion  by  R  (d)  Shearing  with  =»eip{arg(5)/2)  and  Q^iQt  a*  axe* 
of  maximum  extension  and  compression,  respectively,  (e)  Fanning 
along  \E,F\ 


As  for  the  feature  functionals,  we  can  i*se  those  used  by 
Aman  ;2l,22i  and  Kanatam  jlH,20].  We  review  and  modify 
them  so  that  they  fit  in  the  present  new  setting. 

(1)  ANISOTROPY  Or  TEXTURE 

Consider  a  surface  which  h-e  a  spatially  Homogeneous 
(but  not  necessarily  r^ytropic)  texture  consisting  of  line 
segments  The  3P  structure  and  motion  are  detected  by 
checking  the  anisotropy  of  the  texture.  This  method,  applica¬ 
ble  in  the  ^ase  of  orthographic  projection,  was  first  suggested 
by  Wit  kin  '231  and  combined  with  integral  geometry  or 
.itereology  by  Kanatam  |18;. 

I*et  the  line  texture  on  the  image  plane  be  dissected  into 
infinitesimal  line  elements  The  orientation  of  each  line  ele¬ 
ment  is  specified  by  angle  $  'rom  the  jr-axis.  Since  there  are 
two  angles  for  the  same  oner  tat  ion,  i.e.,  0  and  B  +  *  designate 
the  same  orientation,  we  choxse  one  of  them  randomly  with  a 
probability  of  12.  Let  the  I'iitnbution  density  J{0)  be  defined 
in  such  a  way  that  J[9)dB  is  the  summed  length  of  those  line 
segments,  per  vmt  area,  whose  orientations  are  between  6  and 
$  *■  iiB.  By  definition.  r0  —  fi’AVM  is  the  total  length  of  the 


100 


nsA.  s 


Iin«  segments  per  unit  ares.  If  the  distribution  is  iaotropi', 
KB)  is  constant  for  all  B.  If  the  distribution  is  nearly  isotropic, 
the  distribution  dentity  fd)  is  approximated  by  a  Fourier 
series  up  to  the  second  order 

KK)  ~  •— !  I + <wx»28+  ijsin  29] , 

Co =/*/«)*,  (3.3) 

1  t?9  1  ** 

oj=— f  J($)c<x2tdB,  fc,=— /  /0)sui2ffHff. 

Co  0  Co  o 

Here,  first  order  terms  do  not  appear  because  of  the  symmetry 

/tf+ir)=X*)- 

If  the  image  is  ctanging  according  to  orthographic  opti¬ 
cal  Bow  (i.e.,  eqns  (2.1)  with  E= C  aad  F== 0),  the  Fourier 
coefficients  c»,  Cj  and  fcj  of  eqns  (3.3)  change  is  folbws 
jl8.29.30j: 


’c^aj-2)  ro^  Cji,  -Co(aj+2F  T/O 

-oJ-r6  -M’V-t)  -6j(a^-t-4)  «j-6  (3.4) 

-oji,  -fif-taj-eh  -4j+4oj+6  «jhj  L^J 

Thus,  r0,  aj  and  4j  serve  as  feature  functionals,  and  eqn  (3.4) 
corresponds  to  eqn  (3  2),  although  another  feature  must  be 
added  to  determine  A,  B  C  and  D  uniquely. 

I  order  to  measure  Co,  O]  and  hj  from  a  given  image,  we 
must  estimate  the  listrihutioo  density  KB  1  from  the  histogram 
of  line  segment  orientations.  To  this  end,  we  must  choose  an 
appropriate  class  interval  for  the  histogram  If  it  is  too  large, 
estimation  becomes  crude.  If  it  is  too  small,  the  counting  for 
each  class  is  greatly  affected  by  none.  This  difficulty  arises 
because  the  de6mtion  of  the  distribution  densitv  /ff)  involves 
mfimltaimata,  i.e.,  a  limit  taking  process. 

There  exists  a  method  of  estimating  the  distribution  den¬ 
sity  /B)  which  does  not  involve  a  limit  taking  process  This  is 
possible  by  a  altrtologtcal  technique  Instead  of  making  a  his¬ 
togram,  we  count  the  number  of  intersections  between  the  line 
segments  and  a  probe  line  (or  equally  spaced  parallel  scanning 
lines).  Let  Aftf)  be  the  number  of  intersections  per  unit  length 
of  the  scanning  line  of  orientation  i.  Then,  the  observed 
intersection  count  Aftf)  is  related  to  the  disu.bution  density 
KB)  by  what  Kanatani  ( 18.  30)  called  the  (tw»dimensionai) 
Huff  on  transform  : 

,M«)=/ (35) 

If  the  distribution  density  KB)  is  given  by  eqns  (3  3),  ‘.he  inter¬ 
section  count  i\(0)  becomes  (18,  30i 

£• 

,\(9)=-^jl  <-M-cos2ff-i-fJ^m'2fli, 


rig.*  An  example  of  a  textured  surface  image. 


fig.  4.  The  number  of  intersections  of  the  texture  of  Fig.  1  with 
parallel  scanning  lines  of  different  orientation,  the  spacing  being 
1/22  of  the  tide  of  the  square  frame.  Tne  data  are  normalised  to 
that  the  average  is  1/2*.  The  solid  curve  it  Founer  approximation 
up  to  the  second  order 


Fig.  5  EstimaOc-.  of  the  distribution  density  of  Fig  3  up  to  the 
second  order  Fou.  *armonics. 


■fJTu  **.  K.  T2TJ2  m  V_AZ  roOffigrggBCTZingMCTJraBB’  ilM.1  g 


c0— ,4} — y0!’  — ~y^j  (3  ’) 

Hence,  w<:  can  use  C0l  .4?  and  themselves  as  feature  func¬ 

tional*.  They  are  computed  by  measuring  the  intersec  lion 
'"'count  N[9)  and  approximating  the  integrations  of  eqns  (3  6) 
by  appropriate  summations.  For  example,  pul.ing 
iVt=A(jrlr/;V),  t=0,l,...,iV-)  we  may  adopt  the  approxima¬ 
tion 

*-l 

c=  2  E  iVt/.y 

*-o 

(3.8) 

N-l  p  i  N-l  N-l  p  i  ,v-i 

A2= 2  E  'VtCos^L/  E  W*.  E  E  Af*. 

e_o  -v  t~o  fc-o  'v  t-o 


Consider  Fig  3,  for  example.  If  we  draw  on  i*,  equally 
spaced  parallel  scanning  lines  whose  spacing  is  1/22  of  ane 
side  of  the  square  frame  for  orientations  dt=iri/18, 
4r=0,l,...,15  with  iV=  1 6 ,  i.e.,  at  1125°  intervals,  we  obtain 
the  intersection  count  as  shown  m  Fig.  4,  from  which  we 
obtain  j4j=-0.172  and  Bj=0  068  The  solid  curve  j  the 
corresponding  approximation  of  eqns  (3  8).  Fig.  5  is  the 
recovered  distribution  density  of  eqns  (3.3)  estimated  by  using 
eqns  (3.7). 

From  eqns  (3  6)  and  (3  10),  the  change  rates  of  C0,  A, 
and  Bj  become  as  follows. 


-CJl.lt+j)  -GA  G>B,  CofiVy) 

£WA,+y)  «i(^-y)  -4|*-y 

A?Bj  ^-yi4*— y  ~At~—  -A^B, 

(2)  ANISOTROPY  OF  CONTOUR 

In  the.  above,  we  assumed  spatial  humogenctly,  since 
anisetropy  is  expressed  per  unit  area.  This  assui  ption 
as3ur'S  that  the  portion  of  the  texture  newly  coming  into  view 
has  the  same  statistical  characteristics  as  the  portion  of  the 
texture  going  out  of  view  However,  this  assumption  is  not 
necessary  if  the  entire  planar  region  a  viewed,  if  ,  if  we  can 
always  identify  the  planar  region  that  we  are  looking  at. 

♦  Then,  the  distribution  density  /S)  is  defined  in  such  a  way 
that  J[J )d9  is  just  the  summed  length  (not  per  unit  area)  of 
those  line  segments  whose  orientations  are  between  6  and 
9-  Id  Gy  definition,  c0=fl’^9)d3  is  the  total  length  of  the 
line  segments.  If  the  distribution  is  isotropic,  /C)  is  constant 

•  for  all  $  If  the  distribution  density  J[9 )  ts  approximated  by 

•  the  Fourier  series  (3.3)  up  to  the  second  order,  the  change 
rates  of  c„.  a-  and  b2  are  given  by  tqn  (3.4)  except  that  the 
first  row  of  the  matrix  is  replaced  by 

r0(o;-2)  r0i5  -cJtu-2).  (3  10) 

If  we  count  the  number  of  intersections  between  the  tex¬ 
ture  of  the  entire  planar  region  in  question  and  a  probe  line 
(or  equally  spaced  parallel  scanning  lines),  and  if  ,\(d)  is  the 
number  of  intersections  per  unit  length  of  the  scanning  line  of 
orientation  9.  then  Afd)  and  J[9)  are  again  related  by  the 
Button  transform  of  eqn  (3  5).  Hence,  if  the  distribution  den- 


Fig.  8.  The  caliper  diameter  D(t  1  is  the  distance  between  two 
parallel  lines  tangent  to  the  contour  from  utside. 


V 


r  -  — *  m 


■  •  s  ■ 


Fig.  8.  Diameters  of  the  contours  C  and  C*  of  Fig.  7  for  different 
orientations  -  white  circles  for  C  an^  black  dots  for  C*.  The  **>lid 
curves  are  Fourier  approximation  up  to  the  s.  *or  d  order. 


Ill 


sity  is  approximated  by  eqn  (3.3),  fyO)  is  given  by  the  form  of 
eqn  (3.l5),  and  the  change  rates  oi  C0,  Aj  and  Bj  are  given  uy 
eqn  (3  9)  except  that  the  first  row  of  the  matrix  is  replaced  by 


-CoMsry)  -CoBj 


2C«M*+y)- 


An  interesting  application  arises  when  the  planar  region 
has  no  texture  but  its  contour  is  viewed.  Then,  the  contour 
itself  can  be  regarded  as  a  texture.  If  the  contour  shape  is 
convex,  the  intersection  counting  is  equivalent  to  measuring 
the  diarr.cltr  D(8)  defined  as  the  spacing  of  (jvo  parallel  lines 
of  orientatiOD  9  tangent  to  the  contour  (Fig  6),  for  every  line 
has  two  intersections  if  they  exist  (excluding  the  exceptional 
case  of  tangency).  The  contour  shape  need  not  be  convex  if 
the  diameter  is  measured  from  outside,  for  in  this  case  the 
convex  hull  of  the  contour  plays  the  role  of  a  texture.  The 
convex  hull  is  invariant  with  respect  to  projection;  the  convex 
bull  of  a  projected  contour  is  the  same  as  the  projection  of  the 
convex  bull  of  the  original  contour.  The  diameter  D{8)  and 
the  distribution  density  /$)  of  the  contour  are  related  as  fol¬ 
lows  1 19]: 

(3  12) 

If  this  function  is  expressed  in  Fourier  series  as  in  eqn 
(3.6),  the  coeffic.enlo  C0,  A,  and  Bj  change  as  i»  oqn  (3.9)  with 
the  first  row  replaced  by  (3.11).  Consider  the  two  contour 
images  C  and  C'  of  Fig.  7.  for  example.  The  diameters  meas¬ 
ured  at  10°  intervals  of  oriertatton  are  plotted  in  Fig.  8,  where 
the  white  circles  correspond  to  C  and  the  black  ones  to  C'. 
The  solid  curves  are  approximations  of  the  form  of  eqn  (3.6) 
with  C,  A a  and  Bj  computed  by  eqns  (3  8),  indicating  that 
they  fairly  well  characterize  the  data. 

(3)  FILTERING  GRAY-LEVEL  IMAGES 

Suppose  we  are  observing  a  sequence  of  gray-level  images 
of  a  planar  region.  Amari  [21.  22|  suggested  the  use  of  filter¬ 
ing  or  weighted  averaging  for  feature  detection.  Namely,  we 
use 

F[  A]  = /  Jwnix,y)Xlx,y)dxdg,  (3  13) 

as  a  feature,  where  m(x,y)  is  a  fixed  weight  Junction  of  the 
filter,  and  Integration  is  done  over  a  fixed  domain  or  window 
IV  on  the  image  plane.  Suppose  the  area  of  non-zero  gray- 
levels  is  localized  in  the  window  W  so  that  A(x,y)=0  along  the 
window  boundary  and  suppose  the  gray-level  does  not  deptnd 
on  the  gradient  or  the  depth  of  the  object  suiface  An  exam¬ 
ple  is  letters,  lying  entirely  in  the  window  IV,  drawn  on  a 
white  (or  black)  object  surface. 

If  the  image  X[x,y)  changes  according  to  eqn  (3  1),  tb* 
feature  F[.Y]  becomes  after  a  short  tir.-e  interval  6t 

//H,'M*.v)<kdy-//w^x,y)(-|j^x,y)+-|£t(x,y))4;did^  • 

^Abffj^<~-)M‘dcdr+  ■  .  (3.M) 

where  we  performed  inttgiation  by  parts,  setting  integrals 
along  the  window  boundaiv  to  be  zero  according  to  our 
assumption  that  .Y(x,y)  is  zero  at  the  window  boundary 
Thus,  the  change  rate  DF]A"|  of  the  feature  FLY]  is  given  by 

DPXr=J  fj-^-n-t-—Lm+u^+v-^-)Xdzdy  (3.15) 
v  'ir  dz  cry  Ox  cry 

When  the  optical  flow  is  given  by  eqns  (2.1),  functionals  C,[.], 


....  C,j.]  of  eqn  (3.2)  become  as  follows: 

Ci[X]= / iwm^dzdg,  C,[;']=/  Jwm„Xd2dy, 

csW=JJj, m+xro, )Xdxdy,  C4[X]=J J^^dzdy, 

-'.i ■Y]=//|fxmrY<ixdy,  CgLY]— /  Jjm-tymjXdxdy,  (3.15) 
Crl-Yj  =/  z1mI+zym,)Xdxdg, 

c»W=//j3ym+xym,+  yJm,)A'<irdy, 

where  mr— dm/dx  and  m^—dm/Cy  are  known  functions. 
Thus,  <?,[.],  ....  C,[.j  can  be  implemented  as  filters.  Here,  we 
assumed  that  X[x,y)=0  at  the  window  boundary.  This 
a&zumption  is  not  essential,  acd  it  can  be  removed.  Instead 

the  expressions  of  the  functionals  C||.] .  Ct(.j  include  terms 

of  line  integral  along  the  window  boundary. 

(4)  INTEGRATION  ALONG  AND  INSIDE  THE  CONTOUR 
Kanatani  [20]  considered  the  case  where  only  the  bound¬ 
ing  contour  of  a  planar  region  is  observed.  He  proposed  the 
use  of  integration  along  the  contour  C  of  a  given  fixed  func¬ 
tion  m(x,y), 

/W=/cm(x,y)d..  (317) 

as  a  feature,  where  da  denotes  the  line  element  along  the  con¬ 
tour  C.  This  integration  is  easily  performed  on  the  image  by 
i.ciiig  a  scheme  of  numercal  integration  [20J.  Then,  we  s-e 
that 


where  x'»dx/ds  and  j/—-dy/da.  When  the  optical  flow  is  given 

by  eqns  (2.1),  functional!  Ct[  ] .  C,|  of  eqn  (3  2)  become  as 

follows: 

Ci\X]=Jcm,<U,  Ct[X\=Jcm,<U, 

C»[A1=>/cIxw.I+iVm]ds,  C»IA]=/clymI+y',ml<f«,  (3.19) 
^7  W  =“/<4lSr,»+ xymr+(2xxrt + yi,y'+ xj/*)  m]  dv, 

C,|AH  /eivVm»+V*m»+t  V**  -l-x*V+2yy,3)m|ds. 

Hence,  C||.|,  ....  Ct\]  can  be  computed  on  the  image  plane  by 
using  a  scheme  of  numerical  integration. 

Kanatani  |9|  also  proposed  the  use  of  surface  integration 
inside  the  planar  region  5 

F\X'l-fJsniz,y)dzdy,  (3  20) 

of  a  fixed  function  m(.r,y).  Now,  integration  is  done  over  a 
moving  region  S,  not  over  a  fixed  window  IV.  The  change 
rate  is  expressed  in  two  ways,  due  to  Green's  theorem,  ,\s  fol¬ 
lows: 


i-  ji 

:  * 


•»> 

• .1 V 
VV -vV 

■Vh  ‘ 

>V-.V, 

•S-S-'d 

i  •  .  * 

r-W 


.  s.  \ 

v  *.  *. 


CJj 


=  Jj,utf-vz*)mds. 


ett 


112 


S5S55WHSWWWBSBBHHBB 


When  the  optical  flow  is  given  by  eqns  (2.1),  functionals  C'[(.j, 
.  C8[  ]  of  eqn  (3.2)  become  as  follows: 

Q'^fcW<if=ffs">,dxdy,  C2[X\=*-Jcnxd*=*J  f^dzdy, 
C,[A*  =  J^zt/mds=  J 
C*[X\= Jc^l/mds=J  J^mtdzdy, 
Cs[X'l=-Jcxxmds=  /  Jsxm,dx  dy,  (3.22) 

Ct[X\=-Jcyi?m<U=J  J^m-ryinJ  dxdy, 

CJj[A]= J(lr‘t/-zyif)mds=J  fj3xm+x1m,+  xym^dzdy, 

C»  lX}=f(JLiy]/~y>xr)mds^= J  Jj3ym^zym,+  y‘mt,dzdy. 

Hence,  C|[.|,  ...,  Cj(.]  are  computed  on  the  image  plane  as 
either  line  integrals  or  surface  integrals. 


4.  STEPWISE  TRACING  AND  STEREO 

According  to  the  method  described  so  far,  the  flow 
parameters  1%,  c0,  A,  B,  C,  D,  E  and  F  can  be  extracted  from 
two  (or  mor.)  consecutive  images,  and  then  the  structure  and 
motion  parameters  a,  b,  c,  p,  7,  uq,  uq  and  uq  are  determined 
by  analytical  equations  As  was  shown,  however,  there 
remain  certain  indeterminacies  including  the  absolute  depth  r. 
These  indeterminacies  can  be  removed  if  a  sequence  of  images 
is  available  and  if  the  initial  position  of  the  surface  is  known 
[19,  20|  This  becomes  possible  if  we  note  the  fact  that  if  a 
plane  r=pi+7y4  r  is  moving  with  translation  velocities  a,  b 
and  c  and  rotation  velocities  uq,  uq  and  uq  as  described  in  Sec¬ 
tion  2.  the  coefficients  p,  7  and  r  change  as 


P?-q  p5+ 1  Vq-  TWj,  =(  t3+  1  )uq  -p7uq+ pwJ( 


dr  . 

-=c-pa-,6. 


(41) 


Suppose  p,  7  and  r  are  known  at  time  t.  SubstitUi  .n  of 
eqns  (2.2)  in  eqn  (3.2)  yields 

Df'l-V = C.[Al-f C,[Xi+C,[A] + CW)[,Y]  +  C^[,Y1+  CJ*),  (7.2) 

where  C, |.i,  Ct[.[,  C,[.|,  C^J.I,  CJ.]  and  CWl[.[  are  functionals 
defined  by 

^i-l='7P(/c'l-l-Pc»!-l-,c«l-!)-  c*!  l=777(/c:[i-PrsM'VC,i.|) 

Q'i--777(Ali^8[!-y,pc7i.iMr, ;.;)), 


Q>,[-:—  -(pQ,[.!+7Cs|.]  +  —  C3[.j), 


(4.3) 

,  I  r;; 

lCl 


''■...'i  cv  r, 


Since  0,  7  and  r  are  known,  C'J.j,  ....  C  |.j  are  known  func¬ 
tionals.  The  left-hand  side  of  eqn  (1.2),  i  e.,  the  change  rate 
FIX]  of  feature  F[A1,  is  obtained  by  a  numerical  differentiation 
scheme  as  described  earlier.  Hence,  if  we  use  six  or  more 
independent  feature  functionals,  we  obtain  a  set  of  simultane¬ 
ous  linear  equations  of  the  form  of  eqn  (I  2)  to  determine  a,  b. 


c,  uq,  tuj  and  uq  Then  p,  7  and  r  at  time  t+St  are  deter¬ 
mined  by  integrating  eons  (4  1)  by  some  numeri-al  integration 
scheme  like 

p*-p+  i  pgo-q— {  p2+ 1  )oq-uq  j  St,  7—  7+  [( <f  + 1  )uq  -  pqu-<+  pu qj  St, 

(4.4) 

r«—  r+ic~pa-qb]6t. 

or  some  other  higher  order  scheme.  This  process  is  repeated 
to  determine  the  course  of  motion  uniquely  along  time  [19, 
201.  During  this  process,  small  errors  at  each  step  may  accu¬ 
mulate,  so  that  appropriate  modifications  are  necessary  once 
in  a  while,  say,  by  the  direct  method  described  *arlier  or  some 
other  source  of  information 

This  method  is  also  used  to  determine  the  surface  orien¬ 
tation  and  pocition  p,  7  and  r  from  stereo  vision  without 
using  point-to-point  correspondence.  If  we  move  the  camera 
by  /  in  the  negative  ^-direction,  the  object  moves  by  1  in  the 
z-direction  relative  to  the  camera.  In  view  of  eqn  (4  2),  the 
change  rate  dF\X\/  U  of  feature  F\X\  is  equal  to  C,[A].  Simi¬ 
larly,  Cj LX]  and  C,[X]  are  directly  obtained  by  moving  the 
camera  in  the  y-  and  the  e-d'rection  and  measuring  the  change 
rate  of  feature  F,X\.  (In  practice,  of  course,  the  camera  need 
not  be  moved  if  the  necessary  number  of  cameras  are 
appropriately  positioned  beforehand.)  Then,  the  first  three  of 
eqns  (4.3)  provide  a  set  of  simultaneous  equations  to  solve  for 

p,  7  at  I  r,  since  Ct\X\ .  Cu\ A]  are  also  measured  on  the 

image.  First,  p  and  7  are  given  as  a  solution  of 


C'W  rs M+ J  C.W  C, \X\  C'[X] c<\x 1+1  C.\X ]  Cg[Al 
C,[>.]Q|^+ic,[.Y]C,i.Yl  C,\X\Ct\X\+Lch\X\Ct\X\ 


;Jh 


_  ’jcp 
~  1C'\> 


[jqc.Wd-cjxKCiW 

\^c2\x\+ct\.mx] 


+  C„LV]) 

+  C.IA])  ■ 


(4,5) 


and  r  is  given  by 

/C.I.V-pcUV^c;*] 
r~  C.-X] 


/C^-tC-jAI-gCalAI 


-/  (4,6) 


(If  we  use  more  than  three  independent  feature  functionals, 
the  camera  need  be  moved  in  only  one  direction,  say,  in  the 
i-directvn  alone.  However,  this  does  not  seem  to  be  feasible 
in  view  ol  noise  susceptibility  ) 

In  the  orthographic  approximation  /— *00,  eqns  (4  3) 
become 


C.\'~-Cx\],  C,[.|-Cj|.|.  C,|.|=0, 

i--(pQM-*7CkM).  cj]^pc3l]+qcy 


(4.7) 


and  the  process  goes  similarly  except  that  c  is  not  determined, 
as  is  obvious  for  orthographic  projection.  If  the  feature  func¬ 
tionals  that  we  use  are  invariant  with  respect  to  translations 
as  in  (I)  and  (2)  of  the  previous  section  only  three  such 
features  are  necessary  to  compute  uq,  uq  and  uq,  which  in  turn 
dele. mine  the  trajectory  of  p  and  7  Fig  10  shows  the  trajec¬ 
tory  of  the  motion  of  Fig  9  obtained  by  measuring  the  diame¬ 
ter  D(0)  [15  However,  special  care  should  be  taken  when 
p  0  and  7  0,  in  which  case  both  Cu  ;.V  and  f^jA"  vanish 
and  hence  uq  and  uq  are  not  determined  In  this  case,  we 
must  use  a  higher  order  expression  of  the  optical  flow  as 
shown  in  [18,  19:. 


•.'s'- 


•  “  _■  « 


,  • ,  * .  v  ", 

,NV-V- 

'■  a 

f  *,*  V  '.*  1 
.  -  .•*  s  .• 


V  v.v.v 

i.i£ 


-  *  .  r  «  _»« 

r,  - .  *  o 1 

-  •  »  r  a  V 


•m  »  s'’  •  * 

-W  V 

*  «  -  *  •  r,m 


Ci3t 


113 


In  the  pseudo-orthtigraphic  approximation,  the  process 
goes  similarly  except  that  Cc[.)  of  eqns  (4.3)  is  replaced  by 

(<-8) 

Acknowledgement.  The  author  wants  to  express  his  special 
thanks  to  Professor  Azriel  Roeenfeld  and  Professor  Larry  S. 
Davis  at  the  University  of  Maryland  for  helpful  comments. 
He  also  wants  to  thank  Professor  Sbun-icht  Amari  at  Tokyo 
University,  Dr.  Allen  Waxman  at  Thinking  Machines  Corpora¬ 
tion  and  Mr.  Muralidnara  Subbar ao  at  the  University  of 
Maryland  for  discussions  and  suggestions. 


REFERENCES 

[1|  S.  Ullman,  The  interpretation  of  structure  from  motion,  Prac. 
R.  Sac.  Lon*.,  B-203  (1979),  405  -  428. 

|2]  H  -H.  Nagel,  Representation  o.  moving  rigid  objects  based  on 
visual  observations,  Compater.  44-8  (1981),  29  -  39. 

[3]  H.  C.  Longuet-Higgins,  A  computer  algorithm  for  reconstruct¬ 
ing  a  scene  from  two  projections,  Nalart,  239  (1981),  133  - 
135 

[4|  R.  Y.  Tsai  and  T.  S.  Huang,  Uniqueness  and  estimation  of 
three-dimensional  motion  parameters  of  rigid  objects  with 
curved  surfaces,  IEEE  Trans.  Pattern  Anal-  Machine  IntclL, 
P AMI-8  (1984),  13  -  27. 

[5]  K.  Sugihara  and  N.  Sugie,  Recovery  of  rigid  structure  from 
orthographically  projected  optical  flow,  Campat.  Viaian  Graph- 
tea  Image  Praeeaainp,  27  (1984),  309  -  323. 

|8]  D.  A.  Gordon,  Static  and  dynamic  viaual  fields  in  human  space 
perception,  J.  OpL  Sae.  Am..  &&  (1985),  1296  -  1303. 

17]  W  p  Clocksin,  Perception  of  surface  slant  and  edge  labels 
from  optical  flow:  A  computational  approach,  Perception,  9 
(1980),  253  -  209 

[8]  J.  J.  Koenderink  a.  d  A.  J.  van  Doom,  Ecteruapecific  com¬ 
ponent  of  the  moti  m  parallax  field,  J.  Opt.  Saa.  Aw..,  71 
(1981),  953  -  957. 

[9|  H  C  Longuet-Higgi  a.  The  visual  ambiguity  of  a  moving 
plane.  Prac  R  Sae.  Lead.,  U-223  (1984).  135  •  175. 

[  10]  K  Kanatani,  Analysis  of  5<mei«r«  end  Mahan  / ran  Optical 
Flout:  Part  I  Orthographic  Projection ,  Technical  Report, 
University  of  Maryland,  1985. 

jli]  K  Kanatani,  Analysis  of  Stractare  and  Mahon  from  Optical 
Flow:  Part  II  Central  Projection,  Technical  Report,  University 
of  Maryland,  1985. 

[12]  K  Kanatani,  Analysis  a/  Stractare  and  Mahon  fra m  Optical 
Flout:  Part  III  Invariant  Decompaction,  Technical  Report, 
University  of  Maryland,  1985. 

(13|  S.  Ullman,  The  Interpretation  of  Viaaal  Mahan,  MIT  Press, 
Cambridge,  Mass.,  1979. 

(14|  R.  Jain,  Dynamic  scene  analysis  uaing  pixel-based  processes, 
Cumpater,  14-8(1981),  12-  18 

[15]  W.  B.  Thompson,  Lower-level  estimation  and  interpretation  of 
visual  motion,  Campatcr ,  14-8  (1981),  20  -  28. 

[18]  B  K.  P,  Horn  and  B  G.  Schunk,  Determination  of  optical 
flow,  Artif.  Intel!.,  17  (1981),  185  -  203. 

[17]  J  M  Prsger  and  M.  A.  Arbib,  Computing  optic  flow:  the 
MATCH  algorithm  and  prediction,  Comysi.  Vision  Graphics 
Image  Froctaiing,  24  (1383),  271  -  304. 

[13]  K  Kanatani.  Detection  of  surface  orientation  and  motion  from 
texture  by  a  stereological  technique,  Artif.  Intel!.,  23  (1984), 
213  -  237. 

[  19]  K  Kanatani,  Tracing  planar  surface  motion  from  projection 
without  knowing  the  correspondence,  Comysi.  Vision  Graphic s 


h- 


-y  -  .  ’  »  -  ■ 


MM 


y. 

.■v.’c.V'V 


V*.**>. 


V. 


(  -»  -f 


Fig.  0.  CodUmi n  of  a  moving  plane  viewed  orthographically.  The 
orientation  of  Cq  it  aaaumed  to  be  known. 


-05 


10 


C0MPUTED/7 

C' W  4-95 


Co 


Fig-  10.  The  true  and  the  compiled  trajectory  of  the  gradient  {p,q) 
obtained  by  measuring  the  diameter  D{$)  of  the  contour*  of  Fig.  9  at 
10"  interval*. 


Image  Procea* inf,  29  (1985),  1  -  12. 

(20|  K.  Kanatani,  Detecting  the  motion  of  a  planar  surface  by  line 
and  surface  integrals,  Compmt  V'lrian  Graphic*  image  Proce •»- 
inf,  29  (1985),  13  -  22. 

j2l]  S.  Amari,  Invariant  structures  of  signal  and  feature  spaces  in 


•> 

/  v 


►  s  *  ,  '  , 

VV-V-VT 

v\-i\  } 


.•.•s'." 


h r ■ 


.\v"A 

r  v  ‘ 


*  s4*  s  '  »’• 

%*  c  ».v 

r  35 


v. . 


-\V>, 

.  *  V  *  . 

•  «  ■*  m  ■  ■  »  « 

I.  *  m  '  W  e  s% 

•  •  V  1  *»* 


114 


pattern  recognition  problems,  RAAG  Memoirs,  4  (1968),  553  - 
566 

’2J  S.  Amari,  Feature  spaces  which  admit  and  detect  invanant  sig¬ 
nal  transformations,  Proc.  4 th  Int.  Conf.  Pattern  Heeog .,  1978, 
^  pp.  452  -  456. 

:3j  A.  P.  Witkin,  Recovering  si  .face  shape  and  orientation  from 
texture,  Artif.  Int  ell.,  17  (1981),  '.7  -  45. 

A\  M.  G.  Kendall  and  P.  A.  Moran,  Geometrical  Probability , 
Charles  Griffin,  London,  1963 

:5;  R  T.  DeHoff  and  F.  W.  Rhines,  Quantitative  Microscopy. 
McGraw-Hill,  New  York,  1968. 

!6|  E.  £  Underwood,  Quantitative  Sicreology,  Addison-W-sley, 
Reading,  Maas.,  1970. 

i7|  L.  A.  Santalo,  Integral  Geometry  and  Geometric  Probability 
Addison-Wesley,  Reading,  Mass  ,  1976. 

!8|  E.  R.  Weihel,  Stcrcological  Methods,  Vols.  1,  2,  Academic 
1  Press,  New  York,  1979,  19C0. 

!9j  K.  Kanatani,  Dirtribution  of  directional  data  and  fabric  ten¬ 
sors,  Int.  J.  Engng  Set.,  *2  (1984),  149  -  164. 

10]  K.  Kanatani,  Stereologicai  determination  of  structural  aniso¬ 
tropy,  Int.  J.  Engng  Set.,  22  (1984),  531  -  546. 


kPPENOIX  A 

If  we  substitute  eqns  (2.2)  in  eqns  (2.3),  we  obtain 

t/.-igiSL, 

r=p^r- <vi-  P.a+ciP2c , 

J+T  J+T 

S=jxjri-<vrJy^-+t.<V2-pur 


(A.l) 


ir  1  ep  j  1  eq 

K~r^akH~r'+i£) 


f  wt  put  V—  a+ib,  P—p-^iq  and  IV^uq  +  iuij,  these  equations 
ire  rewritten  as 

u*-4L 


R+iT^2Us-jicrP[\v'+jaS). 

S=-tf\W-jUa),  K.-iw.^-v 


(A.2) 


1  Putting 


P^=  —r— — ,  W^W-~'d0, 

f+r  ! 

the  above  equations  are  further  rewritten  as 

i — Ill. - 

r\l,,=(2iJi-Ii)-i2c’+T). 


(A3) 

(A.l) 

(A.5) 

(AS) 


P\V  =  ,S.  PPlU'-JK-I" 

Since  V  is  given  by  eqn  (A. 4),  the  remaining  equations  are  the 
equations  to  determine  P.  P,  IV’  and  u.’3. 

First,  we  check  whether  P  — 0  or  not.  If  so,  we  have 
»’=.(/A-l4/,0  from  the  second  of  eqns  (A. 6).  Then, 
P--S’  (fK-  U0,  J)  from  the  first.  We  can  conclude  P~0  if  i?nd 


-  'J 


Fig.  A  Existence  and  uniqueness  of  nonzero  P. 


only  if  these  IV  and  P  satisfy  />IV,=(2w3-/?)-i7'  obtained 
from  eqn  (A.5).  If  this  is  satisfied  (within  a  certain  threshold), 
ujj  is  given  by  u>3=(/?+Rej/5lV'jj /2. 

Suppose  we  have  already  checked  that  P  is  not  zero. 
The  first  of  eqns  (A .6)  is  rewritten  as  (c75)(-ilV)=c,S.  Hence, 
eqns  (A. 6)  means  that  PP  and  -iW1  are  the  two  roots  of  the 
quadratic  equation 

X*-LX+PS=' 0  ( L=fK-U0/f\ ).  (A. 7) 

Hence,  P  and  W  are  given  as  functions  of  P  by 


P[P)=^(L±  A<-aPS),  nP)=±(L±\/LUPS).  (A.8) 
Then,  eqn  (A  C)  gives  u,  ui  function  of  p  by 

W3=y(R+Ro!/V)»V(c'n), 

and  the  equation  to  determine  p  is 

C^-I(  TVImj/V^V)'!) 

Eqn  (A .10)  d-fines  a  unique  equation  although  two  sets 
of  solutions  exist  for  P,  IV  and  w3.  To  see  this,  let  .V,  und  Xj 
be  the  two  roots  of  eqn  (A  .7).  IT  we  choose  P—.\\/P  and 
IV=t,Vj,  we  have  Im j/,lV'j=-Re|.V).VJ|/c/,  while  if  we  choose 
P=--X2/P  and  IV=|A,,  we  have  Im[PlV*]=- Re^Vjl/c'. 
Since  Re|Jf|.Yjj=Rej.Vj.Yjj,  Im[PH**)  of  eqn  (A. 10)  remains  the 
same  for  both  cases. 

II  we  actually  substitute  eqns  (A  S)  in  eqn  (A. 10),  we 
obtain 

v/l6|SlV2-8Reifi-5,lc'+(fi(t=-Sc'2-4rc'+)fi|5  (All) 

The  left-hand  side  is  a  smooth  concave  function  (or  a  constant 
if  .9--0)  passing  through  (0,|L|"),  while  the  right-hand  side  is  a 
smooth  convex  quadratic  function  also  passing  through  (0,|L|“) 
(Fig  A).  Since  we  know  that  Py^O,  there  exists  a  single 
unique  non-zero  solution  P. 

If  we  take  the  squares  of  both  sides,  we  obtain  a  cubic 
equation 


P*+TP*.^{  VJ+i(r 


'  T\L\‘)—G  (A.  12) 


L  T  aIa  ,A  *4 


. 


t,-. 

v. 


J"*--.. 

iU-JBL 


(A.9) 

\%V-V 

*  %*  V* 

*  C  V* 

»  •  m 

>.%Y 
*  o 

A.  10) 

% 


>  * » ■ » *  ^  * 

•  m  «  .  1 *  .  e,  « 


L-JL 


From  Fig.  A,  it  is  easy  to  see  that  v..is  cubic  equation  has 
three  real  roots  and  that  the  middle  one  is  the  desired  root. 


J-  V  ■ 

l"tar“ 


/  ,*  .■ 


115 


(The  other  two  roots  were  introduced  by  squaring  of  both 
sides.) 


APPENDIX  B 

Since  Uq=a  and  v0=b,  we  only  need  to  determine  p,  q, 
wq,  W;  and  u3.  (c  and  r  are  indeterminate  due  to  orthogra¬ 
phy.)  If  we  substitute  eqns  (2.5)  in  eqns  (2  3),  we  obtain 

T=pui-q*i ,,  R=2ui3-pul-f*)q, 

(B.l) 

■!>=P-r:  fJi+(W'Pui)- 

The  first  two  equations  are  combined  into  a  single  equation 

/f+tT—iJu/j-pu/^fluij+^pwj-ijuii).  (B.2) 

If  we  put  P=p+iq  and  W/=w1+xjj,  the  equations  become 

PW==2u3-{R-riT),  PW^iS.  (B.3) 

Since  \PW"\=\PW],  the  right-hand  sides  must  have  the  same 

modulus,  i.e. , 

(2w3-(ff+lr))(2a;3-(f?-.r))=S5*,  (B.4) 

from  which  w3  is  given  by 

w3=|(ff±'/S5*-74).  (B.5) 

From  eqns  (B.3),  we  immediately  see  that  if  W  and  P  are 

a  solution,  then  so  are  kW  and  P/k  where  k  is  an  arbitrary 
non-zero  real  constant.  Hence,  we  do  not  lose  generality  if  we 
put  IV=bexp(o.'g(  W)),  where  k  is  an  indeterminate  scale  fac¬ 
tor.  Eliminating  P  from  eqns  (B.3)  by  taking  ratios  of  both 
sides,  we  obtain 

1V7=  2u3-(R+iT)  '  (B  8) 

Taking  the  argument  of  both  sides  yields 

2arg(  ^V)=■^■u-arg(S)-arg(2u^,-(ff-l-l7,))  (mod  2tr),  (B.7) 
and  hence 

arg(  IV)=-l+iarg(S)~iargl2wr{«  hT))  ‘mod  tr).  (B  8) 

However,  we  can  ignore  the  mod  *  by  allowing  the  scale  fac¬ 
tor  k  to  be  n^getive.  Then,  W  i  i  given  by  the  secoci  of  eqr.s 
(2.6).  Finally,  P  is  given  from  th '  second  of  eqns  (B.3)  by 
P  —  iS/W,  and  hence  it  >s  written  a.  r.  eqns  (2.6). 


APPENDIX  C 

If  the  pseudo-orthograph 'C  approximation  (2.7)  is 
adopted,  eqns  (A. 6)  are  replaced  b;' 

P\l'=is,  W=ifK  (C  1) 

Hence,  W  is  explicitly  obtained,  and  P—  «.?/  '**.-=  S/(/K-Ua/J). 
The  remaining  u.-3  and  c  are  given  from  eqn  ( A .5)  as 

W3=i(«+Re|PVV"!),  c=41(r+ImiPIv’*|;-  (0  2) 


If  we  note  that 


P  W*  •=  -  isihJ^iL = -  ,5e- 

//v-ty/ 


APPENDIX  D 

Optica,  flows  are  observed  in  the  form  of  eqr.a  (2.1)  with 
respect  to  an  ry-coordinate  system  arbitrarily  fixed  on  the 
image  plane.  The  choice  of  the  cooidinate  system  is  com¬ 
pletely  arbitrary.  Suppose  we  use  an  x'y -coord  in  ate  system 
obtained  by  rotating  the  xtecoordinalc  system  by  angle  8 
counterclockwise.  Then,  the  optical  flow  must  bear  the  same 
form 

•y- y+AV+B  y+(£  v+f  yy, 

(D  1) 

t/=v0'+C'x'+ffJ+(E  y+F  w. 

because  we  are  still  observing  the  rigid  motion  of  a  plane.  In 
other  words,  the  optical  flow  is  form  invariant.  Here,  the  old 
coordinates  x,y  and  the  new  coordinates  J ,t/  are  related  by 


iH 


cost)  sinfl 
-sinfl  cos 9 


Since  the  velocity  compon-nts  are  transformed  as  a  vector, 
the  old  components  u,  t>  and  th»  new  components  i/,  t!  ale 
also  related  by 

•intflr.T  (D.3; 

[t/j  -sinfl  cos#  LqJ 

If  we  substitute  eqns  (D.2)  and  (D.3)  into  eons  (D.  1 )  and  com¬ 
pare  the  result  with  eqns  (2.1),  w*  find  that  uo,  v0  are 
transformed  as  a  vector.  A,  B,  C,  D  a:  transformed  as  a  ten¬ 
sor,  and  E,  F  are  transformed  as  a  vector,  namely, 


M_  rctstf  HW 

[mj  “[-«.■•  cosfij  [mj 

r A'  O']  Tcosd  sinflj  [A  fl]  fee 
|_C"  cosdj  \  C  D\  [si 

•£']_[’ cosfl  sinfllrrt 
IF  'J  ^-sinS  coefj  |.Fj 


cos#  -sind 
sin#  cos 6 


Eqns  (D.4)  (D.5)  and  (D  6)  are  a  linear  mapping  horn  tip, 
%  A,  B,  C.  D,  E.  F  to  V,  V,  A',  B',  C‘.  V,  E',F',  and  this 
mapping  is  a  representation,  i.e.,  a  homomorphism,  of  the  2D 
rotation  group.  As  is  well  known  in  group  representation 
theory,  any  representation  is  reduced  to  one-di.nersiona; 
irreducible  representations  due  to  Schur’s  lemma,  since  the  2D 
rotation  group  is  compact  and  Abelian.  In  fact,  if  we  define 
U0,  T,  R  and  5  as  eqns  (2.3),  the  above  mapping  is  rewritten 


we  obtain  eqns  (2  8). 


©  Ur!— 


As  Herman  Wcyl  pernted  out,  irreducible  representations 
describe  physicrj  quantities  which  are  inherent  to  the 
phenomenor  and  independent  of  the  choice  of  the  coordinate 
system.  Indeed,  the  above  parameters  describe  geometrical 
characteristics  of  the  flow  itself  familiar  in  fluid  dynamics  as  is 
stated  in  the  text.  In  particular.  T,  R  and  5  are  obtained  by 
resolving  the  matrix  composed  of  A,  B,  C  and  D  into  the 
scalar  part,  the  devi  ..tor  (or  t-r-eiess  symmetric)  part  and  the 
antisymmetric  (or  skew)  part  This  is  not  a  coincidence; 

■cording  to  the  general  theorem  of  Wevl,  all  irreducible 
rep;  rntations  of  any  tensor  representation  of  50(n)  are 
obtained  by  a  comb. nation  cf  these  decomposition  processes. 


116 


GCQIQNS 


ROBUST  ESTIMATION  OF  3-D  MOTION 
PARAMETERS  FROM  A  SEQUENCE  OF 
IMAGE  FRAMES  USING  REGULARIZATION' 

Gerard  Medioni  and  Yoshio  Yasumoto2 

Intelligent  Systems  Group 
Electrical  Engineering 
PHE  ’  26  me  0273 
University  of  Southern  California 
l.os  Angeles,  CA  90089-0273 


ABSTRACT 


In  this  study,  we  look  at  the  issue  of  accurate 
estimation  of  the  3-0  motion  parameters  of  a  rigid  body 
from  a  sequence  of  synthetic  images  and  relate  the  effect 
of  some  parameters  to  the  shape  of  an  error  function  We 
first  consider  the  case  where  only  a  small  set  of 
corresponding  points  is  identified  and  suggest  that  a 
technique  called  regularization  improves  the  quality  and 
stability  of  a  solution  We  then  observe  that,  if  more  pairs 
of  corresponding  points  are  available,  the  error  function 
becomes  smooth  end  the  solution  stable  Finally,  we  try 
to  improve  the  quality  of  estimation  by  considering  more 
than  2  consecutive  frames  for  a  moving  camera  looking  at 
a  stationary  scene  and  summing  the  error  functions 
obtained  for  any  2  consecutive  frames  Surprisingly 
enough,  this  technique  does  not  improve  stability  unless 
we  use  regularization  again. 

Keywords  Dynamic  scene  analysis,  image  sequence 
analysis,  motion  estimation. 

1.  INTRODUCTION 

In  recent  years,  many  studies  on  the  estimation  of  the 
3-D  motion  of  a  rigid  body  have  been  performed  Some 
early  works  made  simplifying  assumptions,  such  as 
rotation  around  a  fixed  axis  [22],  [17],  [3],  orthographic 
projection  [20],  translational  motion  Only  [5  10]  In  most 
formalisms,  the  authors  are  led  to  solving  a  set  of  non 
linear  equations,  recently  Tsai  and  Huang  [19]  and 
Longuet-Higgins  [1 1]  independently  obtained  closed  form 
solutions  and  a  set  of  linear  equations. 


'  This  ^search  was  suopurlad.  in  part  by  Ihs  Dalanss  Adyanced 
Research  Pro,ecls  Aqencv  and  was  momto-ed  by  the  Air  Force  Wr.ghl 
Aeronaut, cal  Labors, or, «s  under  conlrac,  F336I5-84-I404,  Oarpu  order 
no  3119 

2rosh,o  Yasumoto  ,s  an  Enq,  ,ee'  tor  Matsush.ta  Electric  Industrial  Co. 
Lto  and  was  a  yisitmq  scholar  .it  jSC 


The  general  paradigm  for  time-varying  imagery 
analysis  is  as  follows: 

-  feature  extraction 

-  feature  matching 

-  motion  parameters  estimation  (and  depth  recovery) 

In  the  formalism  of  optical  flow,  the  first  2  steps  are 
merged  in  the  computation  of  the  optical  flow.  If  the 
image  contains  two  or  more  objects  moving 
independently.  a  segmentation  procedure  becomes 
necessary  In  this  study,  we  only  look  at  the  third  step, 
the  estimation  of  the  3-D  motion  parameters  for  a  single 
rigid  obiect,  and  our  concern  is  the  applicability  to  real 
world  images,  even  though  we  only  present  results  on 
synthetic  imagery  sc  far.  Very  few  authors  reported  on 
such  experiments:  Dreschler  and  Nagel  [6]  only  estimated 
image  displacement.  Roach  and  Aggarwal  [15]  used  TV 
images  but  no  results  were  given;  finally  Fang  and  Huang 
[7]  used  feature  points  with  subpixel  accuracy,  but  the 
results  a  e  not  vory  accurate. 

The  ability  to  process  real  world  images  implies  the 
ability  to  cope  with  somn  amount  of  noise  in  the  input 
data,  such  as  noise  from  the  sensors,  the  digitization  and 
feature  extraction  processes 

In  that  respect,  the  formulation  developed  by  Tsai  and 
Huang  [19]  is  not  appropriate,  as  they  report  54%  error  in 
the  estimation  of  the  .ranstation  with  8  corresponding 
pom*s  for  1%  perturbation  of  the  input.  Bruss  and  Horn 
[4]  use  the  least  square  criteron  to  minimize  the 
difference  between  measured  and  predicted  displacement. 
This  techmaue  is  harder  to  implement,  but  provides  some 
tolerance  to  noise  Similarly,  Adiv  [1]  uses  a  modified 
least  square  error  function  to  obtain  the  value  of  the 
translation  component,  and  then  solves  a  set  of  linear 
equations  tc  find  the  rotation  vector  Here  we  identify  the 
problem  of  estimating  3-D  motion  as  an  ill- posed 
inverse  problem.  This  suggests  the  applications  of  a 
general  technique  called  regularization  developed  by 
russian  mathematicians  in  the  last  twenty  years  [2  ,8] 
and  suggested  by  Poggio  [14]  as  a  natural  mechanism  in 
early  vision 


117 


The  next  section  introduces  the  ideas  of  regularization, 
section  3  gives  the  notations  and  conventions,  section  4 
shows  ihe  influence  of  the  regularization  factor  when  only 
8  corresponding  pairs  are  known,  with  and  without 
quantization  noise.  Section  5  shows  that  if  more  pairs  are 
known,  then  the  error  function  is  smooth  and  therefore 
regularization  does  not  improve  the  result.  Section  E 
presents  the  application  of  the  technique  to  5  consecutive 
frames  with  constant  motion  parameters,  and  we  express 
our  conclusions  in  section  7 


2.  REGULARIZATION 


The  formula'ion  of  many  practical  problems  leads  to 
ill-posed  problems  By  definition,  a  problem  is 
uel  1- posed  if 

-  There  exists  a  solution 

-  the  solution  is  unique 

-  the  dependence  of  the  solution  on  the  input  data  is 
continuous 

From  this  definition,  we  can  see  that  the  problem  of 
estimation  of  3-D  motion  parameters  is  ill-posed  since  'he 
solution  is  not  robust  in  the  presence  of  noise,  and  is  not 
unique  furthermore  it  is  an  inverse  problem,  and  most 
inverse  problems  are  ill- posed 

Regularization  proposes  to  "solve"  these  problems  by 
restricting  the  space  of  acceptable  solutions  by  imposing 
additional  constraints 

In  general,  for  inverse  problems,  one  of  the  formulation 
with  regularization  is  as  follows 

Le:  our  input  data  be  y  We  are  looking  for  a  solution  z 
such  that  Az*y  To  do  that,  we  choose  a  norm  ||||  and  a 
stabilizing  function  Pz  We  then  reformulate  the 
problem  as 

find  a  function  z  that  minimizes  ||Az-y||?  ♦  X||Pz||'? 

The  first  tergn  expresses  the  closeness  of  the  solution 
to  the  input  data,  the  second  expresses  the  degree  of 
regularization,  or  the  additional  constraints,  and  the  factor 
X  controls  the  compromise  between  these  two  terms 

In  our  problem,  we  wish  to  incorporate  the  fact  that  0 
and  T  should  not  change  wildly  when  the  input  data  are 
slightly  perturbed  The  simplest  functional  would  therefore 
be  of  the  *orm 

,IPz||2  -  x,o,2  -  x2n/  ♦  x3q/  ♦  u,t.j  ♦  u2t/  ♦  u3V 

The  translation  factor,  however,  is  only  defined  up  to  a 
scale  factor,  making  it  difficult  to  include  T  m  the 
regularization  term  For  simplicity,  we  also  choose 

X,-X,-X3.  thernore  applying  the  regularization  on  the 
length  of  the  vector  0 


Therefore  Pz  *  /o2,  ♦  Q2,  ♦  Q2, 

Intuitively,  the  explanation  is  as  follows  One  of  the 
assumptions  is  that  the  rotation  angles  are  small  enough 
to  make  the  approximation  sina*a.  but  ihe  solving 
procedure  does  not  check  this  condition  Tne  new 
functional  can  be  regarded  as  find  the  solution  minimizing 
the  error,  subject  to  the  constraint  that  the  vector  0  is 
small.  The  second  term  is  then  a  Lagranga  multiplier 
expressing  the  constraint 


3.  NOTATIONS  AND  CONVENTIONS 


lit  the  object -coordinate  frame,  the  following  is  the 
fundamental  equation  for  a  moving  rigid  object. 

(r'y't'|T  ■  II  |xyt)T  *  T  (1) 

where  (x  y  z)  denotes  the  obiect- space  coordinates  of 
a  point  P  before  motion  end  (x'  y'  z )  denotes  the  object- 
space  coordinates  of  P  after  motion  and  R  is  a  3  X  3 
orthonormal  matrix  of  the  first  kind  whose  elements  are 
expressed  as  a  function  of  the  rotation  angle  9.  and  of  the 
axis  of  rotation  nl.  n2.  and  n3.  and  T  -{T,  .  Ty  ,  Tf)T 

It  we  assume  sma.i  rotation  angles,  matrix  R  is 
expressed  by: 


where  0,  0,.  0,  denote  ths  rotation  angle  around  the 

x.  y  and  z  axis  respectively  as  shown  in  Figure  1. 

y 


Figure  1:  Coordinate  System  of  Camera  and  Image  Plane 


118 


The  relations  between  ob|ect-coordmates 
coordinates  are  as  follows: 

and 

image- 

X  *  tx/Z 

X'  =  (x'/z 

(3-1) 

Y  =  fy/Z 

V  -  fy  /Z' 

(3-2) 

where  f  denotes  the  focal  length  and  (X.Y) 
correspond  to  the  projection  of  a  point  onto  the 

(X'.Y*) 

image 

plane  before  and  after  motion 

Now.  let  (a.  B)  be  the  displacement  vector  (X'-X,  Y'-Y). 
Using  expressions  (1)  through  13).  it  is  possible  to  derive 
an  analytical  expression  of  this  vector  at  each  point  as  a 
function  of  R  T  and  a 


n  xv-n  <f!.x2.-n  iy*it  i-t  xh,( 

«  i  i  ■  I 

1-0  r  a  X.T 

•  If 

n  i»2 - v2i - n  «v-n  ix-it  t-T  *i tit 

■  i  t  v  i 

l-£2  '£2  X-T  l.  , 


(4-1) 


(4-2) 


By  algebraic  manipulation,  and  with  no  approximation, 
we  getJ 


r. 


Lie 

> 

r.'. 


T  t-T  X' 

X  i  XX  V  i 

a*-I2  (f*  )-(l  Y* 

(5-1) 

*  i  *  »  *  / 

<v*  xx  V’V 

(f* ,  ,  ♦ex* 

(5-2) 

•  1  i  t  t  t 

By  replacing  as  follows 

X  Y  XX 

a„-£2,  ,  .fl,(f*  ,  )-S2,Y 

(6-1) 

T  i  T  X 
a  1 

*rn 

(6-2) 

(6-3) 

T  (  T  Y 

Br-'/ 

(6-4) 

We  decompose  the  motion  into 

independent  rotation 

and  translafcn  terms 

i  *  aR  ♦  aT 

(7-1) 

S  *  Bs  ♦  3t 

(7-2) 

We  measure  the  quantity  (a.  B)  at  each  point  and  try 
to  derive  £2,  £2(  £2;  T  and  a 

As  explained  m  the  previous  section,  we  chonse  the 
function  to  be  minimized  as  a  sum  of  2  terms  the  first 
one  is  the  error  measured  in  the  least  square  sense,  and 
the  second  term  is  the  regularizing  factor 


t"is  pajiiron  s  ftitlnixfii  iirim  III  ,n  xlvcn  T  <<  i  jnd  £2  £2  small 

'  *  / 

HAS 


j;"i,[(ai-aR-oT,)2*(B1-BRl-BT,)2]*X(nji2*£2<2*£2!2)  (8) 

where  a  and  8  denote  the  displacement  vector  for  the 
i:h  point,  which  can  be  obtained  from  the  consecutive 
images  at  each  point,  the  first  and  second  term  are  the 
square  of  the  difference  between  the  predicted  and 
computed  displacement,  and  the  third  term  is  the 
regularization  function. 

The  problem  is  to  obtain  three  rotation  angles  £2x  £2y  £2; 
.  the  translation  vector,  T,  and  the  depth  z  at  each  point 

it  is  impossible  to  derive  the  absolute  values  of  the 
translation  vector  T  and  of  the  depth  z.  but  if  the 
translation  vector  has  length  r  not  equal  to  0.  we  can 
obtain  the  direction  of  T  Let  us  introduce  a  unit 
translation  vector  U  and  the  relative  depth  d,. 


<U.* 

Uv  •  U,»  ■  <T.  '  T,  •  Tr>/r 

(9-1) 

d  - 

i 

r/z  i  *  1 . n 

(9-2) 

We  can  then  rewrite  equation  (B)  as  fellows 


I".l<a,-aR.-au.d.)2+<S.-8R.-6U1rf,)2l*X(a.2*nv2*nr  (10) 

where 

•u.  *  u.f-urx,  *  “r/rf. 

Su.**'  d,Y.  *  8T,/d,  (”'2» 


Thus,  we  :  >  to  compute  the  rotation  angles,  the  unit 
translation  vector  li  and  the  relative  depth  d  at  each 
point 

We  can  eliminate  d  by  taking  the  first  derivative  of 
equation  (10)  with  respect  to  d 

The  derivative  is 

2(-(a,-aRl)aUl-(B-BRl)Bu,'>(au,2*BUl2:d,J 
Setting  this  derivative  to  zero,  we  have: 

'W'V'fi, -Sr, 'flu, 

d* 


unless  a^^-By,2  *  0 

We  have  to  take  into  account  the  constraint  that  all  d 
should  be  positive  since  all  obiects  are  on  one  side  of  the 
camera  If  even  one  point  is  negative  it  may  be 
appropriate  to  assume  that  d(  should  be  zero  (1|  In  the 
searching  procedute  mentioned  later  we  eliminate  values 
of  the  parameters  leading  to  points  with  regative  d  by 
checking  this  condition  of  each  point  for  i 


(i?) 


(13) 


L. 


Hv 


c 


f-> 


>■.} 

i: 

i 


119 


We  therefore  evaluate  expression  6.  which  is  the 
numerator  of  equation  <131. 


5*«VaR,>au,+'Bre<i.>8u. 


(14) 


Now  we  can  write  the  error  function  c  for  each 
corresponding  pair  by  replacing  d  in  (10)  by  it  xpression 
(13).  and  after  manipulation,  we  have: 


au,2*Bu.2 


if  4(>0  (15) 


The  function  tc  be  minimized  E(LI.Q)  as  a  function  of 
only  the  motion  pmameters  is: 


E(U.Q)-i",E>.X(Q.2*0/*0,2» 


(16) 


This  function  is  defined  only  when  all  £'s  are  positive. 
We  now  have  to  search  the  space  to  find  the  minimum  of 
this  error  function.  As  we  choose  a  unit  translation 
vector,  we  can  simply  search  the  surface  of  the  sphere 
with  unit  radius.  Furthermore,  equation  (16)  returns  the 
same  value  for  symmetric  points  on  the  sphere,  so  we 
only  have  to  search  the  surface  of  one  hemisphere  and 
check  condition  (14)  for  both  points 

Given  a  point  on  the  hemisphere,  three  linear  equations 
as  a  function  of  0>  .  0y  ,  0(  are  obtained  by  taking  the 
partial  derivative  with  respect  to  them. 


3Eltl.il) 

3e 

- -  ■ 

V",  -*2X0 

30 

n 

‘■•■'an. 

3Eiu.ni 

3e 

• 

y"  *2xo 

30 

‘•■■’an 

V 

r 

3eiu  Oi 

3e 

» 

T"  '  +2X0 

30* 

i'”30l 

From  these  imea>  equations,  the  three  rotation  angles 
are  obtained  Using  '.base  three  rotation  angles,  the  value 
of  the  error  function  (16)  is  obtained  at  each  point  on  the 
hemisphere 

Since  we  are  searching  on  a  half  hemisphere 
representing  the  possible  values  of  the  (normalized) 
translation  vector,  we  can  parameterize  this  space  with  2 
bounded  variables.  9  and  $  and  express  any  unit  vector  U 

*  (U  U  .  U  )T  as: 

*  f  r 

U^'Sin^cosS 

Uy*sin$sin0  (18) 

U,*cosifc 

So  we  use  a  cartesian  system  in  which  the  'x'  axis 
represents  9  and  the  V  axis  represents  ♦  The  two  level 
search  procedure  is  as  follows: 

1)  Coarse  search  We  first  quantize  the  8.  0  cartesian 
into  a  36  by  36  array  corresponding  to  10'  steps  along  0 
and  2  5°  steps  along  $  For  each  celt,  we  compute  the 


function  E(U.O)  ana  check  that  all  £  have  the  same  sign 

2)  Fine  search  In  the  cell(s)  in  which  the  mmniimum 
occured.  we  perform,  a  step  by  step  search  in  1° 
increments  tor  both  axes 


,\N 

,‘.b 


Figure  2  shows  the  shape  of  the  error  function 
obtained  by  tho  first  stage,  which  is  described  in  the  next 
section,  and  in  black  below  it  the  regions  where  a  solution 
is  admissible,  that  is  where  condition  (14)  is  met.  These 
drawings  correspond  to  the  first  data  set  of  table  1 


4.  3-D  MOTION  PARAMETERS  WHEN  8  POINTS  ARE 
MATCHED 


In  this  section,  we  generate  synthetic  data  by  choosing 
a  set  of  3-0  points,  inputting  the  translation  vector,  the 
rotation  vactor  and  the  focal  length  of  the  camera  and 
generating  a  set  of  2-D  corresponding  points  in  the  image 
plane  The  coordinates  of  these  points  are  real  numbers, 
but  we  can  simulate  digitization  noise  by  using  the 
truncated  corresponding  integer  values,  or  by  restricting 
the  number  of  significant  digits. 


DATA  1  DATA  1 

xvirrx  r  rc  r 

i«u»ia»  in  him  hubs  imjims  aueta  i earn  iwjim  hi  we 
ojM7u  li mtu  AtattMi  arum  ut  12*44  ut.iku  iMtieii  manag 
MAMMA  a  Mad  |7  MAUI  tl  l«BM  Ilium  IAAAAAAA  in  MM  141717*4 

■111111  sunlit  Mjaoa  *4*710*  itaaam.'  imaaom  immtv  titan*) 

7UAIMA  MM* in  TUMM  M  1*7*1*  071*  1*4  1*1  ***17  7*4*1111  1***S2*t 

lean  ii*mmt  1444*0*  inn***  itvmoi  tta***i*  immcst  14**10; 

ntano  1 104*04*  1 1*0441  ill  him  t*a*;**i  m***o*t  i***na  i*t.;*n* 

mm**  *10*747  wziMi  tantni  imuoc  it-ji*i*  i*u*m*  m**n* 
T**M  Ur*fe  -  in  IM  Mf*  ■  in 

e.-i**  « •!*•  e-io  e,-ie*  a  ■«•  ot— u* 


Table  1:  Point-Point  Correspondences  m  2  Synthetic 
Images 

Table  1  shows  2  sets  of  8  corresponding  points  which 
we  use  <n  this  experiment.  The  rotation  and  translation 
paramanis  are  given  below  the  corresponding  pairs.  In 
order  to  interpret  the  results  ot  the  experiment,  we  need 
to  display  our  error  function  in  a  meaningful  and  efficient 
manner  which  was  explained  in  the  previous  section.  For 
reasons  of  clarity,  we  do  not  display  the  error  function  E. 
but  -E.  so  that  we  hav  1  to  look  at  the  maximum  of  -E 
(correspond  ng  to  the  minimum  of  E).  For  all  figures,  a 
white  arrow  indicates  the  expected  position  of  the 
extremum,  and  a  dark  arrow  shows  the  computed  position 
of  the  extremum. 

For  the  first  data  set.  the  correct  answer  should  be 
6*225*.  ♦*55°  Figure  3  shows  the  shape  of  the  error 
function  for  3  different  values  of  X(0.  0  1  0  5).  We  can  see 
the  smoothing  effect  of  the  regularization  term  on  the 

error  function.  In  all  cases,  the  solution  found  is  very 
close  to  the  desired  one.  which  is  expected  since  our  data 
is  not  noisy. 


v\- 


c- 

r 


120 


\ . 
r. . 


Figure  2:  Error  Function  and  The  Space  of  Acceptable  Solutions  (DATA  1) 

The  black  area  shows  the  admissible  space 
for  which  all  points  have  positive  a) 


Table  2  shows  the  motion  parameters  obtained  with 
the  different  values  of  X  on  this  first  data  set  For 
completeness  we  also  include  the  results  obtained  by  the 
linear  method  of  Tsai  and  Huang  [19]  For  that  method, 
is  always  1  0.  so  that  -LI  instead  of  U  may  be  found  For 
real  input,  all  methods  give  small  error  ranging  1rorr.  0  % 
to  5  %  for  the  rotation  parameters  and  from  0  %  to  20  % 
for  the  translation  parameters 

In  order  to  understand  the  effects  of  noise,  we  convert 
the  real  numbers  to  the  closest  integer  Giver  the  range 
of  our  aata  the  error  introduced  could  be  as  large  as 
15%,  with  an  average  of  less-  than  0  5%  This  would  be 
equivalent  to  choosing  features  in  an  image  with  pixel 
accuracy,  which  is  common  in  real  world"  images  Even 
with  such  a  small  amount  of  noise  the  error  function 
exhibits  some  peaks  leading  to  erroneous  motion 
parameters  The  elfect  of  the  regularization  factor  on  the 
shape  of  the  error  function  is  demonstrated  in  figure  3  for 
3  values  of  the  parameters  X(0.0,  5  3.  10  0)  For  X  =  0  0  the 
searching  piocedure  produces  a  spurious  maximum  far 
Irom  the  expected  value  As  we  increase  the  value  of  X 
the  maximum  gets  closer  to  the  desired  value  The  results 
are  summarized  in  table  2  In  this  table  tor  integer  input 
we  observe  that  the  error  decreases  as  \  increases  The 
rotation  parameters  are  very  wen  estimated  by  the  iir.ear 
method,  but  not  the  translation  parameters 

We  repeat  the  procedure  on  a  different  data  set  shown 
in  table  1  n  which  the  52  vector  is  not  symmetric.  For 
this  example  the  correct  answer  should  be  9  =  45 °  $*35°. 


52^10°  52^0°  52^-10°  Using  all  significant  digits  of 
the  input  data  and  no  regularization,  we  obtain 
6*45°.  $’35“  (U^OS.  U  *05.  U;-I0)  and  the  associated 
rotation  vector  U  -099°  12  *-001°  U  =  -101°O 

•  y  l 

If  we  now  use  integer  values  for  input,  we  find  the 
totally  mcorreci  parameters  6=155°.  $=37°  Thes'  results 
are  tc  be  compared  with  the  parameters  omeined  by 
searching  the  space  fcr  an  error  (unction  including  a 
regularization  factor  with  X=50  The  obtained  results 
including  linear  method  [19]  are  summarized  in  table  3  In 
this  example  both  tha  linear  and  no-regularizat:cn 
methods  give  quite  incorrect  parameters  for  translation 


Imm# 

IfTWF 

X-00 

f  fTt* 

X-9  1 

Pm* 

X-04 

Irro* 

o.<‘> 

137 

-0  83 

140 

ooo 

141 

001 

1  4» 

004 

o.r> 

153 

on 

1M 

ooc 

1  44 

-OOi 

143 

-007 

0,1*) 

ID 

0  02 

1  *0 

ooo 

1  44 

COO 

1  49 

-O  01 

u. 

-1  oc 

000 

1  01 

C01 

CM 

-0  04 

C43 

-fl  ]? 

u* 

-too 

000 

101 

001 

0*4 

-ooo 

090 

-0  20 

u 

)  OG 

- 

-1  00 

- 

-1  00 

- 

-1  DO 

- 

- 

- 

22* 

- 

275 

- 

724 

- 

' 

- 

51 

- 

43 

- 

41 

- 

iinn> 

l"M* 

Itrro* 

X-OO 

Irror 

X*4  0 

trrwr 

X-100 

tm* 

1?4 

-0  is 

4  06 

2  66 

321 

TTT 

2  39 

0  99 

U 

1  a 2 

0  32 

0  74 

-0  75 

0  94 

-0  45 

1  JO 

-040 

0,|') 

» Cl 

0  01 

006 

-1  54 

0*7 

-1  03 

1  13 

-447 

u. 

0  41 

-144 

0  11 

-6  93 

0  13 

-0  97 

924 

-0  74 

u. 

0  31 

-1  39 

2  04 

104 

IM 

044 

1.21 

0  21 

o, 

1  00 

- 

-I  00 

- 

-100 

- 

-100 

- 

•n 

247 

- 

269 

- 

246 

- 

- 

44 

- 

i : 

- 

»1 

' 

— 

’r»puf  0*1#  9-274* 

1  (-45* 

<u,*of— yp.  0,-14* 

0,-1  »• 

0,*i  9*  — 

- - - 

Table  2:  The  Motion  Parameters  (DATA  I) 


121 


a 


V 

a 


i 


and  rotation  With  regularization,  the  value  is  not 
absolutely  correct,  but  stays  close  to  the  expected  value, 
even  though  the  input  is  not  very  accurate.  The  estimated 
amounts  of  noise  in  the  input  is  between  0.35  an C  0  9  %. 
The  corresponding  error  functions  are  displayed  on  figure 
4 


DATA  2  is  a  typical  example  demonstrating  The 
ii..provement  due  to  regularization.  The  non-regularization 
and  the  linear  method  [191  perform  very  poorly  here  We 
generated  many  examples  with  many  combinations  of 
small  rotation  angles  and  any  translation  parameters,  and 
where  10  to  30  pixels  displacements  were  observed  on  the 
image  plane.  This  assumption  is  reasonable  for  camera 
looking  ct  an  object  off  center  We  found  that  with  no 
exceptions,  regularization  vastly  improves  the  results. 


*’.1 


i 


u. 


Real  Input  Integer  input 

Figure  3:  Error  function  of  DATA  * 

Input  Data:  9-225*  ♦‘55*  (U/U,— Ut).  0,-lV  0/1.5*  0/1  B* 


►  ■* 


1 


122 


•\V_  V 


— ' 

(fitr 

1-C4 

Cfw 

1*44 

tffw 

fl.C) 

004 

IN 

-ooi 

047 

-O  U 

a,;-) 

-3» 

509 

-tiC' 

*401 

040 

8.20 

ay) 

-l«t 

IM 

-101 

-4  Cl 

-140 

00* 

u. 

•  M 

•m 

Oft 

440 

OO 

-447 

w, 

44ft 

too 

•  ft 

•40 

04ft 

-04» 

U, 

14 

- 

14 

- 

14 

•O 

- 

- 

4ft 

- 

•ft 

- 

«*) 

- 

- 

M 

- 

« 

- 

1^4 

Iww 

1-44 

Inw 

ay) 

-34ft 

-444 

-44ft 

-7  34 

1  14 

4  »t 

a,!*) 

-144 

-144 

»7 

207 

-447 

-447 

ay) 

•712 

-144 

-144 

-124 

-144 

-443 

u. 

1  It 

•4ft 

-444 

-1  14 

44ft 

414 

UT 

-V44 

-144 

442 

-414 

•4ft 

4J4 

U, 

14 

- 

14 

- 

14 

*•> 

- 

- 

114 

- 

.14 

- 

«•) 

- 

* 

17 

* 

44 

- 

‘ 

tapftt  Oftftft; 

a-aa*  *o»’  ai.^j,-u/n 

0.-14*,  Q,-«*. 

a-..- 

Tibia  3:  The  Motion  Parameters  (DATA  2) 


h - 4 

t 

t 


flail  Input 


Intagar  Input 


Figure  4:  Error  function  for  DATA  2 
Input  Data:  8-45*  <.*35°  |U„*Uv*U,/2).  A.*1-0*-  *V°*  «,-!  0” 


a  •  •  r 

j\V 

I-  •* 


123 


S.  MORE  THAN  d  CORRESPONDING  POINTS 


in  section,  wa  ?ske  ?  jog*  ar  tfte  the  srror 

t  (unction  varies  as  a  (unction  of  the  number  o( 
corresponding  points  To  obtain  the  corresponding  points, 
we  proceed  as  previously  by  taking  a  random  collection  of 
3-0  points,  choosing  the  motion  parameters  and 
computing  the  resulting  associated  pairs  The  error 
function  is  displayed  m  Figure  5.  for  the  values  N-8,  16.  22 
and  30  of  the  number  of  corresponding  pairs,  with  and 
without  the  regularization  term 

As  we  can  clearly  see.  tne  more  points  we  take  into 
account,  the  smoother  the  error  function  becomes,  and  if 
we  use  a  regularizing  factor  the  error  function  is  already 
smooth  for  N-8.  and  in  this  case  more  accurate  value  can 
be  obtained  than  error  function  without  regularization 

This  indicates  that  the  effects  of  the  regularizaf  \i  are 
especially  useful  when  only  a  few  pairs  are  known 

6.  MORE  THAN  2  CONSECUTIVE  FRAMES 

Many  researchers  have  proposed  using  three  (or  more) 
consecutive  frames  for  motion  analysis  (20.  21.  9.  13]: 
Human  perception  of  motion  seems  to  be  very  noise 
sensitive  when  only  2  frames  of  moving  dot  patterns  are 
presented  [8], 

Here,  we  create  5  consecutive  images  m  which  the 
motion  parameters  stay  constant,  and  generA'e  4  sets  of 
corresponding  pans  In  this  example.  we  assume  that  the 
camera  is  moving  with  constant  parameta.s  and  that  the 
scene  stands  still  This  assumption  is  reasonable  d 
images  are  obtained  from  a  camera  a.tached  to  a  robot 
manipulator  moving  with  constant  rotation  ano  translation 
parameters  With  these  assumptions  we  can  write  lor  any 
point  in  frame  i. 

-  R  OM1'1  <  T  (19) 

We  compute  the  error  function  for  the  4  consecutive 
frame  pairs  All  data  are  truncated  to  the  closest  integer, 
m  order  to  simulate  noire,  and  the  amount  depends  on  the 
value  nf  coord. nates  on  the  image  plane  in  this  example, 
from  frame  1-2  through  4-5.  the  amount  of  noise  is  0  2  - 
0  4  %  0  3  -  0  6  X.  0  3  -  0  8%,  and  0  3  -  12  % 
respectively  Although  we  use  the  same  motion 
parameters,  the  computed  values  vary  between  frames,  as 
shown  on  figure  6.  where  the  right  column  rjpresents  the 
error  (unction  with  regularization  (A -5  0)  and  the  left  one 
the  error  (unction  without  regularizing  factor  (A -0  01  The 
last  row  snows  the  error  (unction  obtained  by  averaging 
the  4  previous  error  functions  In  table  4.  we  show  tht 
values  obtained  from  rhese  image  sequences,  including  the 
results  ol  the  linear  method  (19] 


which  is  not  true  when  X-0  0  We  observe  that  computed 
errors  without  regularization  increase  according  to  the 
errors  included  in  images  As  shown  on  the  last  row  of 
t.hc  figu-e  ssarcnmg  the  sum  ot  the  4  error  functions 
ieads  to  a  nearly  correct  extremum  only  with  the 
regularization  term 

This  serves  to  demonstrate  that  smoothing  the  error 
function  by  simply  averaging  it  over  a  time  sequence  is. 
rot  enough  to  overcome  the  tact  that  the  problem  is  ill- 
posed 


dm 

4I-| 

— 

hw 

tit: 

1 

til 

taw 

All 

IM 

am 

ni 

Ml 

am 

i  n 

w 

1  U 

tit 

•IM 

•IM 

am 

am 

aj*) 

471 

-an 

*DI 

•114 

•IM 

am 

u4 

\u 

OM 

44 

•IM 

•  M 

in 

u. 

1 11 

141 

IM 

U 

4U 

IM 

y, 

u 

■  M 

-11 

-IM 

It 

•M 

*•> 

■• 

- 

1*2 

- 

sat 

. 

•O 

“ 

• 

11 

- 

m 

- 

‘  r  _ 

4|  l 

taw 

1-41 

taw 

tit 

taw 

U2 

4J9 

-nt 

•  Ml 

-IM 

-IM 

•  H 

IM 

Ut 

am 

am 

■4-1 

-4  m 

-IM 

V% 

Ml 

IM 

am 

» It 

•  It 

am 

IM 

am 

Ml 

•  01 

HI 

-its 

-IM 

S 

T 

IM 

ut 

11 

•  M 

it 

am 

it 

IM 

- 

- 

am 

- 

m 

- 

*•> 

* 

* 

o 

■ 

M 

• 

fin 

MM 

few* 

iMV 

AU 

4v 

tit 

taw 

AO 

II) 

40 

111 

Jtl 

us 

its 

•r'l 

IM 

14 

in 

ill 

•  11 

•11 

•21 

-147 

-141 

-MS 

-141 

IM 

IM 

u. 

1  tl 

•  TS 

-11 

-11 

11 

am 

u. 

•  fit 

i  m 

•  I 

It 

IM 

nt 

u. 

II 

am 

-1 J 

•  M 

11 

•  to 

- 

- 

m 

- 

sat 

♦n 

* 

■ 

•i 

* 

ti 

- 

Nm 

IM  4  -  • 

Mr 

!«*•» 

tu 

ta*r 

tit 

taw 

•x*i 

4  II 

Ill 

m 

W* 

IM 

•in 

1,1*1 

-am 

44 

IM 

IS* 

141 

141 

aj'i 

■)U 

-1  U 

hi 

-111 

its 

ID 

m 

IM 

t  Tfl 

•  It 

1 1  • 

•  It 

u. 

n 

•  M 

•  it 

•M 

•IM 

IM 

u. 

ii 

IM 

it 

*M 

1  I 

•M 

•ri 

- 

- 

H 

Ut 

. 

■ 

' 

m 

* 

tl 

• 

IfTW 

tit 

»n» 

tit 

IfTW 

14/ 

III 

m 

•  «t 

*1  «l 

141 

a,<*> 

•  44 

0*4 

1  04 

•  04 

•  It 

•  It 

ay} 

4M 

•  ao 

-l*M 

-I4  4J 

-1  11 

1  11 

u. 

111 

PM 

ici 

111 

IM 

•  OS 

u. 

•  n 

•  It 

-i  m 

-1  M 

-KM 

-•>4 

u. 

10* 

•  m 

IM 

am 

IM 

•  M 

*•> 

- 

- 

jm 

.  - 

lit 

■ 

■ 

a t 

* 

U 

• 

“Hu  *•»* 


-w  cwm  ee1  m-ivrHi a  V  •’  a,— It* 

Table  4:  The  Motion  Parameters  (Consecutive  Images) 


The  computed  extrema  of  (he  error  (unction  are  verv 
similar  lor  the  5  inter-frames  when  we  use  regularization. 


120 


S. 


:-i'.>ryroa«TLEiKinarH 


7.  CONCLUSION  AND  FUTURF  RESEARCH 


In  this  experimental  study,  we  have  made  clear  the 
•usefulness  of  a  regi  ri.-ung  term  in  at  least  2  instances 

-  In  the  presence  of  noise 

-  When  only  a  few  points  are  known 

We  found  that  such  a  technique  stabilises  the  value  of  the 
*  desired  parameters  in  a  "noisy"  image,  and  may  be 
applicable  in  pixel-based  image  analysis  and  real  time 
motion  analysis 

One  problem  that  nas  not  bean  examined  here  is  the 
,  optimal  choice  of  the  parameter  X  It  obviouslv  depends 
on  the  amount  of  noise  in  the  image  and  could  he 
automatically  set  if  a  valid  model  of  both  image  and  noise 
were  possible  to  derive,  as  in  a  prei  ious  work  [121 

Our  final  remarks  relate  to  the  shape  of  the  error 
function,  which  is  nearly  flat  on  large  portions  cf  the 
search  space,  as  noted  by  111.  making  it  >'ery  difficult  to 
select  a  extremum  This  seems  to  suggast  that  the 
formulations  of  the  problem  used  so  far  do  not  capture 
enough  information,  or  do  net  put  enough  restrictions  on 
the  solu’ion  A  different  approach  is  needed,  where  the 
solution  would  appear  as  a  sharp  peak  One  possibility 
being  currently  investigated  is  to  apply  image  plane 
acceleration  to  motion  analysis  [161 


ACKNOWLEDGMENTS 


We  would  like  to  thank  Michel  Medloni  for  deriving 
equation  (5),  and  Hormoa  Shariat  for  fruitful  discussions. 


References 

[I]  Ariiv,  G 

Determining  3-D  Motion  and  Structure  from  Optical 
Flow  Generateo  by  Several  0*.,ec;s 
in  Proceedings  of  Image  Understanding 
Workshop  October  198-t.  pages  113-129 
DARPA.  Science  Application  International 
Corporation  1934 

[21  Arsenin.  V  Ya 

Reguiariaation  Method. 

USSR  Comp.  Math.  8.  1968. 

[31  Bobick.  A 

A  Hybrid  Approach  to  Structure-from-Motion. 
in  Proc.  ACM  Interdisc.  Workshop  on 
Motion,  Toronto,  pages  91-109  1983 

(4]  Bruss  AR  and  Horn,  BKP 
Passive  Navigation 

Massachusetts  Institute  Technology 
A.  I.  Memo  66  2  (662).  1981 

[51  Cloclsm.  WF 

Perception  of  surface  slant  and  edge  labels  from 
optical  flow  A  Computational  approach 
Perception  .  1980 

[6]  Dreschler.  I  and  Nagel.  H  -H 

Volumetric  Mode  and  3-0  trajectory  of  a  Moving 
Car  Derived  from  Monocular  TV-frame  Sequence 
of  a  Street  Scene 

in  Proceedings  of  7th  1JCAI , 

Vancouver ,  Canada  August.  1981 

(71  Fang  J -Q  and  Huang.  TS 

Some  Experiments  on  Estimating  the  3-0  Motion 
Parameters  of  a  Rigid  Body  from  Two 
Consecutive  Image  Framos 
IEEE  Trans,  on  Pattern  Analysis  and 
Machine  Intel  1  igence  6(5)  545-554.  1984 

181  Lappm  J  S  .  Doner  J  f  and  Kottas.  B  L. 

Minimal  Conditions  for  the  Visual  Deteufon  of 
Structure  and  Motion  in  Three  Dimensions 
Science  209  717-719  1980 

191  Lawton.  D  T 

Constraint-Based  inference  from  Image  Motion 
In  proceed ings  of  ti.e  First  Annual 
national  Conference  on  Artificial 
Intelligence  August.  1980 

.101  fee  DN 

The  Optical  Flow  Field  The  foundation  of  vision 
Phil.  Trans.  Royal  Soc . ,  London.  1980 

(111  Longuet-Hiqqms  HC 

A  Compuier  Algorithm  tor  Reconstructing  a  Scene 
from  Two  Protections 
nature  1981 


127 


[121  Medioni,  G  and  Moriamez  ? 

Tomographie  Reconstruction  d  Images  par 

Deconvoluton  Numerique  at  Optique  dans  le 
Plan  da  Fourier 

Tech.  Report  ENST-H-77002 .  Ecole 
• Nationa'e  Superieure  Des 
Telecommunications  .  June,  1977, 

[131  Metri,  A.Z. 

On  Monocular  Perception  of  3-D  Moving  Objects 

IEEE  Trans,  on  Pattern  Analysis  and 
Machine  Intelligence  2:582-583,  1980 

f14)  Poggio,  T  and  Torre,  V. 

In-posed  Problems  and  Regularization  Analysis  in 
Early  Vision. 

in  Proceedings  of  Image  Understanding 
Workshop  October  1934  DARPA.  Science 
Application  International  Corporation.  1984 

[151  Roach.  J  W  .  and  Aggarwal  j  K. 

Determining  the  Movement  of  Objects  from  a 
Sequence 

IEEE  Trans,  on  Pattern  Analysis  and 
Machine  Intelligence  2:554-552.  1980 

[16)  Shariat.  H 

The  Motion  Problem  A  Decompositior.-Baced 
Solution. 

in  Proceedings  of  IEEE  conference  on 
Computer  Vision  and  Pattern 
Recognition  at  San  Francisco,  pages 
181-183  1985 


[17]  Sugie.  N  &  Inagaki.  H 

A  Computational  Aspect  of  Kinetic  Depth  Effect. 
Biol.  Cybern.  50431-436.  1984 

[181  Tikhonov.  A.N. 

The  Regularization  of  ill-Posed  Problems 
Dokl .  Akad .  Nau.  SSR  153(1)  49-52.  1963 

[19!  Tsai.  RY  and  Huang.  T  S 

Umquenes:  and  Estimation  of  Three-Dimensional 
Motion  Parameters  of  Rigid  Objects  with  Curved 
Surfaces. 

IEEE  Trans  on  Pattern  Analysis  and 
Machine  Intelligence  6(1)13-26.  January. 
1984 

[201  Oilman.  S. 

The  Tnterprstaticn  of  Visual  Motion. 
Massachusetts  Institute  Technology  Press.  1979. 

[211  Ullman  S 

Maximizing  Rigidity  The  incremental  Recovery  of 
3D  Structure  trom  Rigid  and  Non  Rigid  Motion. 
Perception  13255-274,  1984. 

[221  Webb.  J  A  and  Aggarwal.  J X 

Structure  from  Motion  of  Rigid  and  Jointed  Bodies, 
in  Proceedings  of  7th  IJCAI, 

Vancouver ,  Canada  1981. 


GCPno  IMS')W§i 


Contour,  Orientation  and  Motion 

John  Aloimonos.  Anup  Basu  and  Christopher  M.  Brown 

Computer  Science  Department 
University  of  Rochester 
Rochester,  New  York  14627 


K'.y. 


Abstract 

intrinsic  image  calculation  exploits  constraints  arising 
from  physical  and  imaging  processes  to  derive  physical 

*  scene  parameters  from  input  images.  After  i  brief  review 
of  a  paradigmatic  intrinsic  image  calculation  we  turn  to  a 
new  result  that  derives  shape  and  motion  from  a  sequence 
of  patterned  inputs.  Experimental  rtsults  are  demonstrated 
for  synthetic  images. 

I.  Shape,  Orientation,  and  Paraperspective 

One  of  the  first  and  best-known  CAamples  of  intrinsic 
image  calculation  [Barrow  &  Tenenbaum  1978|  is  the 
recovery  of  shape  from  intensity  [e.g.  Horn  1978.  Ikeuchi 
&  Horn  1981],  Shape  of  a  smooth  surface  is  defined  to  be 
local  surface  orientation  or  viewer-centered  relative  depth, 
which  are  derivable  from  each  other.  Orientation  is 
parameterized  by  the  unit  normal  vector  of  the  surface 
/(x.y ).  or  by  the  projection  of  such  vectors  with  z>0  from 
the  origin  onto  the  plane  z  =  1.  This  projection  yields  the 
popular  (p.q)  or  gradient  space  representation  commonly 
used  in  reasoning  about  line  drawings.  A  polar  version  of 
(p.q)  space  is  (slant,  tilt)  space.  For  example,  the 
orientation  of  a  p'ane  that  is  tilted  in  an  upward  direction 
lies  somewhere  on  the  positive  p  axis,  the  farther  away 
from  the  origin  the  greater  the  slant  of  the  plane.  At 
infinite  slant  the  plane  is  edge-on  to  the  viewer,  with  its 
normal  perpendicular  to  the  line  of  sight 

Ohta's  [  198 1  j  perspective  approximation  (called 
paraperspective  here)  is  useful.  I.et  a  coordinate  system 
OXY7.  be  fixed  with  respect  to  the  camera,  w;th  the  -Z 
axis  pointing  along  the  optical  axis,  and  O  the  nodal  point 

♦  of  the  eye  (center  of  the  lens).  The  image  plane  is  assumed 

to  be  perpendicular  to  the  /  axis  at  the  point  (0.0.- 1),  (i.e. 
focal  length  =  1).  Ohia  approximates  perspective 

projection  by  a  two-stage  affine  transformation.  A  texel  is 
taken  to  lie  on  rhe  tangent  plane  Q  of  the  surface  at  the 
texei's  centroid.  Q  has  orientation  (p.q)  A  plane  P  is 
erected  through  the  texel  centroid  natal lol  to  the  image 
plane,  and  tile  texel  s  shape  is  projected,  parallel  to  the  line 
joining  the  viewpoint  and  the  texel  centroid,  opto  P.  Phis 
first  stage  is  a  skew  transformation.  I  he  second  stage  is  a 
true  point  projection  from  plane  P  to  the  image  plane 
through  the  viewpoint.  Since  P  and  the  image  plane  are 
parallel,  this  projection  amounts  to  a  pure  scaling  by  some 
constant  factor,  say  1///.  Paraperspective  approximates 
lov.ution-  and  depth-dependent  perspective  foreshortening 
and  size  distortion,  and  the  approximations  arc  quite  good 
for  smali  shapes. 


To  represent  the  original  pattern  of  the  surface  texel. 
we  use  an  (a,b,c)  coc>d:nate  svstem.  with  its  origin  at  the 
mass  center  of  the  texel  and  the  (a,b)  plane  identical  to 
the  plane  Q.  To  represent  the  pattern  of  the  'mage  texel. 
we  use  an  (a'.b.c)  coordinate  system,  with  its  origin  the 
point  (A.B.-l).  where  (A.B)  is  the  mass  center  of  the  image 
texel.  and  the  axes  a.b'.c'  are  parallel  to  the  axes  X,Y,Z 
respectively.  Then  the  transformation  from  (a,b)  to  (a'.b  ) 
with  the  two  step  projection  process  of  previous  section  is 
given  by  an  affine  transformation  of  equation  (*).  In  our 
work  we  choose  one  P  plane  for  the  entire  image,  which 
implies  that  depth  variation  must  be  small  relative  to 
depth. 


~  \  +  pA  _ fid 

I  |  ,M  I +r  xllv? 

^  dlZULlm  </8~p:~l 

7  V  (l-i-p-Yl- j-’+V' 


This  transformation  expresses  the  relation  between  two 
2-D  patterns,  one  in  the  3-D  space  and  the  other  its  image 
on  the  image  plane.  We  now  use  it  to  develop  a  basic 
constraint, 

2.  The  Area  Ratio  Constraint 

The  determinant  of  the  matrix  of  an  affine 
transformation  is  equal  to  the  ratio  of  the  areas  of  the  two 
patterns  before  and  after  the  transformation.  Specifically, 
if  Vjv  is  the  area  of  a  world  texel  that  lies  on  a  plane  with 
gradient  (p.q)  a"d  S/  is  the  area  of  its  image  that  has  mass 
center  (A.B),  then  we  have: 

-1 *pA  pR 

(  ✓  1  *P:  ✓(lv 

S»  "  '-tt  - q<p*  i)  l 

•J  (l  +  p\l+p**q:)  7(l  +  /pt(l+p-»</-1 

or  ■'*'  _  J_  I  -  4p- Rq 

Sw  j}1  7  i  +  p-+v‘ 

or  r  _  Sw  i-  ip-Rp 

s'  -  VTTfprp  <**) 


Equation  (**)  relates  the  area  of  a  world  texel  v(V,  its 
gradient  (p.q),  the  area  S/  of  its  image  and  i:-;  mass  center 
(A.B).  If  we  call  the  quantity  Sj  "textural  intensity,"  and 


f  ■  •■is** 


i . 


j- 


z  /  .V> 


UAJv 


•.■VI'  -  • 


'■TO 


the  quantity  sw/fi2  '  textural  albedo."  then  equation  (**) 
is  very  similar  to  the  image  irradiance  equation  for 
Lambertian  surfaces: 


/  =  A 


where  fp.q)  is  the  gradient  of  the  surface  point  whose 
image  has  intensity  l,  \  is  the  albedo  at  that  pomt  and 
(A.3,1)  the  direction  of  the  light  source  (Horn,  1977; 
Ikeuchi,  1981).  Thus  equation  (**)  can  be  used  to  recover 
surface  orientation.  This  shape  from  texture  work  is 
reported  in  [Aloimonos  &  Chou.  1985;  Aloimonos  & 
Swain,  1985). 


3.  Orientat!on  from  Contour  Without  Point 
Correspondence 

Planar  surface  orientation  may  be  recovered  from 
contour  information.  In  fact  the  change  of  the  perceived 
area  of  a  planar  contour  from  different  cameras  in  a 
known  Configuration  is  enough  to  recover  the  3-D 
structure  of  the  contour,  without  the  knowledge  of  point  to 
point  correspondence  between  the  different  images. 


The  recovery  of  three-dimensional  shape  and  surface 
orientation  from  a  two-dimentional  contour  is  a 
fundamental  process  in  any  visual  system.  Recently,  a 
number  of  methods  have  been  proposed  for  computing 
this  shape  from  contour.  For  the  most  part,  previous 
techniques  have  concentrated  on  trying  to  identify  a  few 
simple,  general  constraints  and  assumptions  that  arc 
consistent  with  the  nature  of  all  possible  objects  and 
imaging  geometries  in  order  to  recover  a  single  "best" 
interpretation,  from  among  the  many  possible  for  a  given 
image.  For  example,  Kaitadc  [1981]  defines  shape 
constraints  in  terms  of  image  space  regularities  such  as 
parallel  lines  and  skew  symmetries  under  orthographic 
projection.  Witkin  [1981]  looks  for  the  most  uniform 
distribution  of  tangents  to  a  contour  over  a  set  of  possible 
inverse  projections  in  object  space  under  orthography. 
Similarly,  Brady  and  Vuille  [1984]  search  for  the  most 
compact  shape  (using  the  measure  of  area  over  perimeter 
squared)  in  the  object  space  of  inverse  projected  planar 
contours. 


Rather  than  attempting  to  maximize  some  general 
shape-based  evaluation  function  over  the  space  of  possible 
inverse  projective  transforms  of  a  given  image  contour,  we 
propose  to  find  a  unique  solution  by  using  more  than  one 
camera,  since  it  cm  be  easily  proved  that  only  one  image 
(under  orthography  or  perspective)  of  a  planar  contour 
admits  infinite  interprcations  of  the  structure  of  the  world 
plane  on  which  the  contour  lies.  Finally,  the  need  for  a 
unique  solution,  which  is  guaranteed  in  our  approach, 
comes  also  from  th*  fact  that  there  exist  many  real  world 
counterexamples  to  the  evaluation  functions  that  have 
been  developed  to  date.  For  example.  Kanade's  and 
Witkin's  measures  incorrectly  estimate  surface  orientation 
for  regular  shapes  such  as  ellipses  (which  are  often 
interpreted  as  slanted  circles).  Brady's  compactness 
measure  does  not  correctly  interpret  non-compact  figures 
such  as  rectangles  since  he  will  compute  it  to  be  a  rotated 
square  (e.g.  if  we  view  a  rectangular  table  top.  we  do  not 


see  it  as  a  rotated  square  surface,  but  as  a  rotated 
rectangle.) 

In  the  sequel,  we  present  two  methods  for  the  unique 
recovery  of  shape  from  contour,  one  based  on  three  views 
and  the  other  based  on  two  views,  without  having  to  solve 
the  point  to  point  correspondence  between  the  dilTerent 
imager  of  the  contour.  We  proceed  with  the  following 
proposition  (Fig.  1). 


Proposition:  Let  a  coordinate  system  O.X.Y.Z  1  e  fixed, 
with  the  -Z  axis  pointing  along  the  optical  axis.  We 
consider  that  the  image  plane  Imj  is  perpendicular  to  the 
Z  axis  at  the  point  (0,0,- 1).  Let  a  plane  n  with  equation  -Z 
=  pX  +  qY  +  c  in  the  world,  where  (p,q)  is  the  gradient 
of  the  plane  that  contains  a  contour  C.  Furthermore,  we 
consider  two  more  cameras  with  image  planes  Imj  and 
Imj,  whose  coordinate  systems  (nodal  points)  are  such  that 
any  wcrld  point  has  the  same  depth  with  respect  to  any  of 
the  cameras.  Then,  assuminp  par-; perspective  projection  of 
the  contour  C  onto  any  of  me  image  planes,  the  images 
Cj.  C2.  and  C3  of  the  co.vou-  on  the  three  cameras  are 
enough  to  determine  unique!;  oe  orientation  of  the  plane 
n.  without  having  to  solve  the  point  to  point 
correspondence  between  Cj,  C2.  C3. 

Proof:  I.et  Sj,  $2-  and  S3  be  the  areas  of  the  contours 


Cp  C2  and  C3  respectively.  Let  also  the  depth  of  the 
center  of  gravity  of  the  contour  C  be  /}.  It  5W  is  the  area  of 
the  contour  C  on  the  plane  fl.  and  (Aj.Bj).  (A2.B2)  and 
^.3,63)  the  centers  of  gravity  of  the  image  contours  Cp 
C2  and  C3  respectively,  then  the  area  ratio  constraint  (**),  f 


is  that: 

•V-V-V-V- 

11 

to  |co 

1  1  -Axp-Bxq 

P1  V  1 +/>2+92 

(1) 

■  .  “ 

.,v,v  .--\-c 
i’.v  .v' ■-  ,s 

ii  = 

Sit' 

1  \-A1p-ti1q 

Pl  1/  l+p2+<72 

(2) 

v\ --VS. --V. 

r  V,'-*,  .*/-• 

•  */  *_,•  *  • 

11 

L  l-Arf-Btf 

P2  Vl>p2+q2 

(3) 

Dividing 

the  above  equations  appropriately. 

we  derive. 

[  • 

Si 

1-/4  \p-Bxq 

(4) 

S, 

I—  A  ip—  B}Q 

)*  •  •* 

Si 

Si  ~ 

1  -4'J>-B2q 

X-AiP-Btf 

(5) 

.'*4 

'■v’X'Xv’c* 

•  ,  V  ,  1  ,  «  ,  « 

Equations  (4)  and  (5)  constitute  a  linear  system  with 
unknowns  p  and  q,  which  in  general  has  a  unique  solution 
(q.e.d). 


130 


* 


A  degenerate  case  in  the  solution  of  the  above  system 
arises  when  the  centers  of  all  three  image  planes  are 
collinear.  Experiments  using  the  above  method  on 
perspective  images  computed  'die  orientation  of  the  world 
contour  with  great  accuracy.  Despite  the  fact  that  the 
paraperspective  projection  is  an  approximation  of  the 
perspective  projection,  and  the  error  depends  on  many 
factors  (slant,  tilt,  depth,  size  of  the  contour,  for  a  detailed 
discussion,  see  [Aloimonos  &  Chou,  1985]),  it  seems  that  in 
the  above  method  much  of  the  introduced  error  is 
cancelled.  This  is  a  fact  that  was  brought  to  our  attention 
from  extensive  experiments.  We  are  currently  working 
towards  the  theoretic;.',  explanation  of  this  error 
cancellation. 

4.  Solving  the  Problem  with  Two  Frames 

In  the  previous  section,  we  used  uiree  frames  for  the 
recovery  of  shape  from  contour.  But  the  information  we 
used  from  the  image  contours  was  only  their  area,  and  in 
particular  how  the  area  was  changing  from  view  to  view.  A 
useful  piece  of  information  that  we  have  not  yet  utilized  is 
the  length  of  the  contour  (which  is  of  course  independent 
of  its  area).  Using  this  information,  we  can  solve  the  shape 
from  contour  problem  with  two  projections  (binocular 
observer)  but  in  a  computationally  much  harder  way 
involving  nonlinear  equations. 

Consider  a  coordinate  system  O.X.Y.Z,  to  be  fixed  with 
respect  to  the  left  camera,  with  the  -Z  axis  again  pointing 
along  the  optical  axis.  We  consider  tha>  the  image  plane  of 
the  left  camera  is  perpendicular  to  the  Z  axis  at  the  point 
(0,0,- 1).  The  nodal  point  of  the  right  camera  is  the  point 
(Ax.0,0)  and  the  image  plane  of  the  right  camera  is 
identical  to  the  one  of  the  left  camera  (Fig.  2).  C  is  a 
contour  on  a  world  plane  n  with  equation  -Z  =  p.X  +  qY 
+  c.  and  C|  and  CR  are  the  projections  of  the  contour  C 
on  the  left  and  right  image  respectively  using  the 
paraperspective  projection.  We  can  easily  prove  that  a 
small  line  segment  (I  cos  6,  I  sin  6)  on  the  image  plane  is 
due  to  the  projection  of  a  line  segment  on  the  world  plane, 
with  length  L  -  |  Ltf,  with 


i  = - - — -  ,/  k  iCos2d+  k  2S\n26+  k  }swdcos8 

(1  -Ao-Bq)2 

where: 

k]  =  (1-qB)2  +  (pB)?-  +  p7 

k2  =  (1-pA)2  +  q(  A)2  +  q2 

kj  =  2((l-qB)qA  +  (1-pA)  pB  +  nq)  . 

and  (A.B)  is  the  center  of  gravity  of  the  area  under 
considertioii.  So,  given  a  contour  in  an  image,  if  we  break 
the  contour  into  small  line  segments  (edges)  (I;  costfj,  lj 

sin#;),  i  =  1 .  n,  then  the  length  of  the  contour  in  the 

world  plane  is  given  by: 


L  =  ll.L, 

1=1 

with 

L,  =  - ^  y  k{co‘C-8l+kli\r\18l+k-icosd,smdl 

1  Ap  oQ 


kT  =  (1-qB)2  +  (pB)2  +  p2 

k2  =  (1  pA)2  +  (qA)2  +  q2 

k3  =  2((l-qB)qA  +  (1-pA)  pB  +  pq)  , 

0  is  the  depth  of  the  center  of  gravity  of  the  world 
contour.  If  we  consider  now  the  left  and  right  images  of 
the  contour  C  (Tig.  3),  and  we  compute  the  length  of  the 
world  contour  from  each  one,  we  should  find  the  same 
answer.  In  other  words,  if  L[  and  LR  are  the  length  of  the 
world  contour  that  we  compute  from  the  left  and  right 
image,  respectively,  we  must  have 


LL  =  LR  (6) 


Equation  (6)  is  an  equation  in  the  unknowns  p.q,  but  it 
is  in  a  complicated  form  that  does  not  permit  easy 
algebraic  manipulations. 

On  the  other  hand,  if  Sw,  St ,  SR  are  the  areas  of  the 
world  contour,  the  left  image  contour  and  the  right  image 
contour  respectively,  then  we  have 


•Sz 

1 

\-ALp-BLq 

(7) 

P2 

V  1 +p2+q2 

i*_  = 

1 

\~ARp~BRq 

(8) 

P1 

V  l +/>2-tV 

■  here  (A 

|.B, 

and  (Ar,Br)  are  the 

centers  of  gravity  of 

the  left  and  the  right  image  contour  respectively.  From  (7). 
(8).  we  conclude 

Si  1  -A,p-B,q 

—  = - — - —  (9) 

Sr  1~  /ifl  </ 


Equation  (9)  represents  a  straight  line  in  gradient 
space,  or  a  great  circle  in  the  (equivalent)  Gaussian  sphere 
formalism.  Equations  (6)  and  (9)  constitute  a  nonlinear 
system  in  the  unknowns  p  and  q.  We  are  currently 
wouing  on  a  theoretical  analysis  concerning  the  number 
of  the  solutions  of  this  system.  Preliminary  experimental 


131 


/ 


results,  based  on  the  following  discrete  method,  indicate 
that  there  exis*s  a  unique  solution.  'Die  discrete  method  we 
used  is  as  follows:  Equation  (9)  represents  a  great  circle  in 
the  Gaussian  sphere  (constant  a/imuth.  varying  elevation). 
B>  taking  different  values  for  the  elevation  angle  (180 
values,  if  the  different  values  are  1  degree  apart)  we  solve 
for  the  gradient  p.q  and  we  choose  Ovs  p.q  'hat  makes  the 
function  (L(  -  LRr  minimum. 

So  far.  we  have  presented  two  methods  for  the 
determination  of  shape  from  contour,  one  based  on  *hree 
views  and  the  relative  change  of  area  among  the  different 
views,  and  the  other  based  on  two  views  (binocular 
observer)  and  change  of  area  and  perimeter  of  the 
contours  between  the  two  different  frames.  We  now 
proceed  to  a  method  for  3*0  motion  determination 
without  having  to  find  point  to  point  correspondence 
between  the  successive  d>namic  frames. 

5.  Determining  >D  Motion  without  Correspondence 

Here  we  onl>  treat  the  case  of  pure  translation.  The 
general  case  of  rotation  and  translation  can  be  found  in 
[Aloimonos  &  Basu  1985|.  TTte  treatment  of  this  section 
presumes  real  perspective  projection  noi  paraperspective. 

Consider  a  coordinate  system  ()\Y/  fixed  wiih  respect 
to  the  camera.  0  the  nodal  point  of  the  eve  and  uie  image 
plane  perpendicular  to  the  •/.  axis,  (focal  length  1)  that  is 
pointing  along  the  optical  axis  (Fig.  4).  Let  us  represent 
points  on  the  image  plane  with  small  letters  «x.>))  and 
points  in  the  world  with  capital  letters  ((X.Vy)}. 

Let  a  point  P  =  (Xj.Yjy.j)  in  the  world  with 
perspective  image  ('[•>])  where 

x  ]  -  X  | '/ 1  and  v  [  -  Y  j/Z  j.  If  the  point  P  moves  to  the 
position  P'  =  (Xs.Ys.Zs)  with 


Under  the  assumption  that  the  depth  is  iarge  (and  the 
motion  in  depth  small),  the  equations  above  become: 

x2-xi  =  (AX-xi*  AZ)/Z  (10) 

y2->i  =  (AY ->!*  A7.)/Z  .  OU, 

All  the  published  methods  for  the  recovery  of  the 
direction  (A\/AZ.  AV/AZ)  are  based  on  the  above 
equations  (10)  and  (11)  (see  [Ullman  1979.  Longuet- 
Higgins  1981.  Tsai  &  Huang  1984.  Bandyopadhyay  & 
Aloimonos  1985]).  which  of  course  require  the  knowledge 
of  the  correspondence  between  points  in  the  successive 
frames.  In  the  next  section,  we  piesenv  a  method  for  the 
recovery  of  the  translational  direction  of  a  moving  planar 
contour  ( AX/AZ.AY/ A/),  without  having  to  solve  the 
correspondence  piv^blem. 

6.  Motion  of  a  Planar  Contour  without  Correspondence 

Consider  again  a  coordinate  system  OXYZ  fixed  with 
respect  to  the  camera,  and  a  contour  C  on  a  plane  Z  =  pX 
+  qY  +  c  that  is  moving  along  the  vector  (AX. A  Z.aZ). 
tnd  let  Cj  and  C2  be  the  two  successive  images  of  the 
contour  C  (see  Fig.  5).  We  suppose  that  the  orientation  of 
the  contour  world  piane  is  already  known  (i.e.  it  has  been 
found  by  one  of  the  two  already  presented  methods).  In 
what  follows,  to  facilitate  analysis,  we  will  present  a 
discrete  analysis  (i.e.  we  will  talk  about  summation  over  all 
discrete  points  of  the  contour,  instead  of  integration  along 
the  contour). 

Consider  a  point  (x,>j)  on  contour  Cj  (the  first  frame 
of  the  sequence),  which  moves  to  a  point  (Xj.yj)  on  contour 
C2.  For  the  moment  we  do  not  worry  about  where  the 
point  (Xj.Vj)  is  on  the  second  contour  C2.  The  only 
important  thing  is  tnat  (Xj.Vj)  »  C2  and  it  is  the 
corresponding  pom:  of  (x,.y,l  »  Cj.  hrom  that,  we  have 


Zs  --  Zj  *■  A/ 

then  we  desire  to  lind  the  direction  vif  (he  translation 
<  A\/AZ.  AY/aZ).  If  the  image  of  P  is  (x2. vs),  then  the 
observed  motion  of  the  world  point  m  the  image  plane  is 
given  by  the  displacement  vector  In  -xj.  .  s  -  v  1 1  (which 
in  the  exse  of  very  small  motion  is  aUi  known  js  optic 

(low ) 

We  can  exsily  prove  that 

AX- v i  AZ 


AY- i  AZ 


AT  -  A/ 

/, 

if-  i,  A/ 


where  Z,  is  the  depth  of  the  contour  point  whose  image  is  * 
the  point  (x,.y,).  faking  into  account  that 

-  P\  *  9V,  -  c  or 

1  -  px,  w  qv(  s-  c/Z,  or  • 

1/Z,  =  1/c  (l  px,  -qvi).  equations  (12)  and  (13) 
become: 

AT  -  i,  AZ  ,,ji 

i  -  ■  = - (  l  -  nt .  -  )  (!•*) 


At  - .  A/ 

- —  (  I  -  /n,  -  i/i.  ) 


Equations  (14)  and  (15)  relate  the  x  and  v  coordinates 
respectively,  of  two  corresponding  points,  the  first  on 
contour  Cj  and  the  second  on  contour  C]. 

If  we  write  equation  (14)  for  all  the  points  on  the  two 
contours,  and  we  sum  up  all  these  equations,  w>*  get  the 
following 

A.V  -  x,  A/ 

£  *>  -  £  «.  =  £ - - - (  1  -  px,  -  qy,  )  ( 16) 


In  equation  (16),  IjXj  denotes  the  sum  of  the  x 
coordinates  of  all  the  points  on  contour  Cj.  IjXj  denotes 
the  sum  of  the  x  coordinates  of  all  the  points  on  contour 
Cj  and  the  right  hand  side  of  uie  above  equation  is 
summed  over  all  points  of  contour  Cj.  Equation  (16) 
becomes: 

r  <£',-£',  >  =  41  £  (  l  -  px,  -  <j>, ) 


-  4/  £'.<!-  P*.  -  4\,  )  (17) 

In  an  analogous  way.  work.ng  with  equation  (15)  we  get: 
f  t  £>,  -  I>, )  =41  2  <  i  -  px,  -  V. ) 

J  ‘  * 

O  *  4/  £.,<  1  -  pt,  -  V,  I  (18) 

Erom  equations  (17).  (18)  we  conclude: 

S',  -  Is  17  * 1 '  p:‘  ~  1  "  £  '<  < 1  -  p*<  -  v.i, ) 

£  ,  -  £.>',  |1  J  ,  1  -  px,  -  <j>,  )  -  <  1  -  P1.  ~  > 

(19) 

Equation  (19)  is  a  linear  equation  on  the  unknowns 
AX/AZ  and  AX/AZ  This  equation  is  due  to  the  motion 
of  the  contour  on  one  image  frame,  with  a  binocular 
observer,  we  get  two  linear  equations  (Fq.  19)  from  the 
motion  of  the  contour  in  both  the  left  and  right  images, 
which  gives  a  unique  solution  for  the  direction  of  the 
translation  (£\/A /.  AX/A/).  without  using  point  to 
point  correspondence.  Obviously,  to  apply  this  method  ir. 
natural  images,  one  would  hav:  to  solve  the  problem  of 
contour  correspondence  (macro-correspondence)  which 
seems  easier  than  point  to  point  correspondence. 

Emails,  experimental  results  based  on  this  method  are 
very  accurate  and  robust.  A  recent  method  presented  by 
Kanatam  [1985a.  1985b[  has  numerical  instabilities  that 
affect  the  desired  result  a  great  deal. 


Figures  6-10  show  results  of  binocular  and  tnnocular 
expenments.  Figure  6  show  s  the  perspective  images  of  a 
planar  contour  taken  by  three  cameras  at  the  (>ositions 
(0,0).  (0.50)  and  (50.0)  respectively.  The  actual  orientation 
of  (he  contour  in  space  was  given  by  the  gradient  (p.q)  = 
(15.25).  The  computed  orientation  was  (p.q)  -  (14.99. 
24.99).  Figure  7  shows  again  the  perspective  images  of  a 
planar  contour  taken  by  three  cameras  at  the  positions 
(0.0).  (0.50)  and  (50,0)  respectively.  The  actual  orientation 
of  the  contour  in  space  was  (p.q)  =  (30,5)  and  the 
estimated  orientation  was  (p.q)  =  (30.  4.99).  Figure  8 
shows  the  images  of  a  translating  planar  contour  (human 
figure)  taken  by  a  binocular  system  at  two  different  time 
instants.  The  actual  orientation  of  the  contour  in  space  was 
(p.q)  =  (10.5)  and  the  actual  direction  of  translation 
(dx/dz.  dy/d z)  =  (-4.6).  Our  program  recovered 
orientation  (p.q)  =  (10.00007,  5.000297)  and  direction  of 
translation  (dx/dz.  dy/d z)  =  (-4.00(1309,  6  00463).  Figure 

9  shows  aga  n  the  perspective  images  of  a  translating 
planar  contour  taken  by  a  binocular  system  at  two 
different  time  instances.  The  actual  orientation  of  the 
contour  was  (p.q)  =  (-25.30)  and  the  direction  of 
translation  (dx/dz. dy/dz)  =  (5060).  Th**  computed 
orientation  from  these  images  was  (p.q)  -  <  -  24.99, 
30000021)  and  the  computed  direction  of  translation 
(dx/dz.dy/dz)  =  (49.858421.  59.830266).  Finally,  Figure 

10  shows  the  perspective  images  of  a  translating  planar 
contour  taken  by  a  binocular  system  at  two  different  times. 
The  actual  orientation  of  the  contour  was  (p.q)  = 
(10.-11)  and  the  direcuon  of  translation  (dx/dz.dy/dz)  = 
(1.66.  3.33).  The  estimated  parameters  from  these  images 
were  (p.q)  =  (9.99,  -11.000383)  and  (dx/dz.dy/dz)  = 
(1.66.  3.33). 


,7 


i 


I 

E 


i 


r 


i 


j 


13  3 


Acknowledgments 

Cur  thanks  gc  »o  Dana  Ballard  and  Amit 

Bandyopadhyay  ror  their  help  during  the  preparatioii  of 

this  paper.  This  research  was  sponsored  by  the  Defense 

Advanced  Research  Projects  Agency  under  Grants 

DACA76-85-C-0001  and  N00014-82-K-0193. 

References 

Aloimonos,  J.  and  Basu.  A.,  "Shape  and  motion  from 
contour."  forthcoming  Technical  Report.  Dept,  of 
Computer  Science,  Unis.  of  Rochester,  1985. 

Aloimcnos.  J.  and  Chou.  P..  "Detection  of  surface 
orientation  and  motion  from  texture,"  TR161. 
Dept,  of  Computer  Science.  Univ.  of  Rochester, 
January  1985. 

Aloimonos.  J.  and  M.  Swain.  "Shape  from  texture". 
Proceedings.  IJCAI85,  l.os  Angeles,  CA.  Aug.  1985. 

Bandyopadhyay.  A.  and  Aloimonos,  J..  "Perception  of 
structure  and  motion  of  rigid  objects,"  TR169, 
Dept,  of  Computer  Science.  L'mv.  of  Rochester, 
1985. 

Barrow,  H.G.  and  J.M.  Tenenbaum  "Recovering  intrinsic 
scene  characteristics  from  images",  in  Computer 
Vision  Systems.  A.  Hanson  and  H.  Riseman  (eds). 
Academic  Press.  New  York  1978. 

Brady,  M.  and  Yuille,  A.,  "An  extremum  principle  for 
shape  from  contour,"  l EPF.  Trans.  Pattern  Analysis 
and  Machine  Intelligence,  6,  288-301,  1984. 

Brown  C.M.,  J.  Aloimonos.  M.  Swam.  P.  Chou  and  A. 
Basu  "Texture,  Contour.  Shape  and  Motion," 
submitted  to  Pattern  Recognition  Tetters. 
September  1985. 

Horn.  H.K.P..  "Understanding  image  intensities."  Artificial 
Intelligence  8.  (2).  201-231.  1977. 

Ikcuchi.  K..  "Shape  from  regular  patterns."  Artificial 
Intelligence  11.  49-75.  1984. 

Ikcuchi.  K.  and  Horn,  U.K.P.,  "Numerical  shape  from 
shading  and  occluding  boundaries."  Artificial 
Intelligence  17,  141-184.  1981. 

Kanade.  T..  "Recovery  of  the  three  dimensional  shape  of 
an  object  from  a  single  view."  Ar'tfcial  Intelligence 
17.  409-460.  1981. 

Kanatam.  K..  "  I  racing  planar  surface  motion  from 
projection  without  knowing  correspondence," 

( MilP.  29.  1-12.  1985a. 

Kanatam.  K  .  "Detecting  the  motion  of  a  planar  surface  by 
line  and  surface  integrals."  Cl  (HP.  29.  13-22. 
1985b. 

Render.  J.R..  "Shape  from  texture:  an  aggregation 
transform  that  maps  a  class  of  textures  into  surface 
orientation."  Proceedings.  IJCAI,  475-480,  1980. 


k 


je_ 


h"» 


r' 

i 


-  V 


k 


© 

0 


135 


r  j 

'•  I 


Render,  J.R.  "Shape  from  texture:  A  computational 
paradigm,"  P-ocetiings,  DARPA  Image 
Understanding  Workshop.  April  1979,  79-84. 


Longuet-Higgins,  H.C.,  "A  computer  algorithm  for 
reconstructing  a  scene  from  two  projections," 
Nature  239:  10,  133-13J.  1981. 


•P. 


Ohta.  Y„  Maenobu,  K.,  and  Sakai.  T„  "Obtaining  surface 
orientation  of  from  texels  under  perspective 
projection."  Proceedings,  IJCAI,  746-751,  1981. 


Tsai,  P.Y.  and  Huang,  T.S.,  "Uniqueness  and  estimation 
of  three-dimensional  motion  parameters  of  rigid 
objects  with  curved  surfaces,"  IEEE  Trans,  on 
Pattern  Analysis  and  Machine  Intelligence ,  6,  13-27, 
1984. 

Ullman,  S„  "The  interpretation  of  visual  motion."  VI  IT 
Press,  Cambridge.  1979. 

Witkin,  A..  "Recovering  surface  shape  and  orientation 
fiom  texture,"  Artificial  Intelligence  17.  17-45.  1981. 


/  V.V.  rj  j’j 


Gcp^Cj  ims-iQ?;.-; 


EPIPOLAR-PLANE  IMAGE  ANALYSIS: 

A  TECHNIQUE  FOR  ANALYZING  MOTION  SEQUENCES* 


Robert  C.  Bolles 
H.  Ilarlyn  Baker 
SRI  International 
333  Ravenswood  Avenue 
Menlo  Park,  CA  94025. 


Abstract 


A  technique  for  unifying  spatial  and  temporiJ  analysis  of  an  unaf«  ssquencs  taksn  by  a  camera  moving 
in  a  straight  line  is  presented.  The  techniqns  is  based  on  a  ‘dense*  eeqnct.ce  of  imsge^-unaf ee  taken  close 
enough  together  to  form  a  solid  block  of  data  Slitee  of  this  solid  directly  encode  changes  due  to  motion  of  th« 
camera.  These  slices,  which  have  one  spatial  dimension  and  one  temporal  dimension,  arc  more  structured  thaa 
conventional  images.  This  additional  structure  makes  them  easier  to  analyse.  We  present  the  theory  behind  this 
technique,  describe  an  initial  implementation,  and  disc  use  our  preliminary  results. 


Introduction 

Most  motion-detection  techniques  (e.g.,  [Barnard  1980j, 
[Haynes  1983],  and  [ Hildreth  1984))  analyze  pairs  of 
images,  and  hence  are  fundamentally  similar  to  con¬ 
ventional  stereo  techniques.  A  few  researchers  have 
considered  sequences  of  three  or  more  images  (e.g., 
[Nevatia  1976],  (Ullman  1979],  and  [Yen  1983]),  but  stil: 
the  process  is  one  of  matching  discrete  items  at  discrete: 
times.  And  yet,  it  is  widely  acknowledged  that  there  is  a 
potential  benefit  from  unifying  the  analysis  of  spatial  and 
temporal  information.  In  this  paper  we  present  a  tech¬ 
nique  to  perform  this  type  of  unification  for  straight-line 
motions. 

Motion-analysis  techniques  using  pairs  or  triples  of  im¬ 
ages  are  designed  to  process  images  that  contain  signif¬ 
icant  changes  from  one  to  another  -  features  may  move 
more  than  20  pixels  between  views.  These  large  changes 
force  the  techniques  to  tackle  the  difficult  problem  of 
stereo  correspondence  (Figure  1  shows  an  image  triple 
with  a  typical  irter-frame  separation).  Our  idea,  cn  the 
other  hand,  is  to  take  a  sequence  of  images  from  posi¬ 
tions  that  are  very  close  together  -  close  enougn  that 
almost  nothing  changes  from  one  image  to  the  next.  In 
particular,  we  take  images  close  enough  together  that 
none  of  the  image  features  moves  more  than  a  pixel  or 
so  (Figure  2  shows  the  first  three  images  from  one  of  cur 
sequences  containing  125  images).  This  sampling  fre¬ 
quency  guarantees  a  continuity  in  the  temporal  domain 
that  is  similar  to  continuity  in  the  spatial  domain.  Thus, 
an  edge  of  an  object  in  one  image  appears  temporally  ad¬ 
jacent  to  (within  a  pixel  of)  its  occurrence  in  both  the 
preceding  and  following  images.  This  temporal  continu¬ 
ity  makes  it  possible  to  construct  a  solid  of  data  in  which 
time  is  the  third  dimension  and  continuity  is  maintained 
over  all  three  dimensions  (sec  Figure  3).  This  solid  of 
data  is  referred  to  as  spatio-temporal  data. 

The  traditional  motion-anaiysis  paradigm  detects  fea¬ 
tures  in  spatial  images  (i.e.,  the  uv  image.,  in  Figure  3), 
matches  them  from  image  to  image,  and  then  deduces  the 
motion.  We,  however,  propose  an  approach  that  is  or¬ 
thogonal  to  this.  We  suggest  slicing  the  spatio-temporal 
data  along  a  temporal  dimension  (see  Figure  4),  locat- 

‘Thii  rrsparch  waa  supported  by  DARPA  Contracts 
M'W  '-'.'3-63-C-0027  and  DACA  7G-85-C-OOG4 


ing  features  in  these  slices,  and  then  computing  th.-ee- 
dimensional  locations.  Our  reasoning  is  that  the  tem  >j- 
ral  image  slices  can  be  formed  in  such  a  way  that  they 
contain  more  structure  than  spatial  images;  thus,  they 
are  more  predictable  and,  hence,  easier  to  analyze. 

To  convince  you  of  the  utility  of  this  approach,  we  must 
demonstrate  that  there  is  an  interesting  class  of  motions 
for  which  we  can  build  structured  temporal  images.  In 
the  next  section  we  show  that  this  can  be  done  when¬ 
ever  the  camera  moves  in  a  straight  line.  We  call  these 
temporal  images  epipolar-plane  image*,  or  EPIs,  from 
their  geometric  properties.  In  Section  3  we  describe  the 
results  of  our  experiments  in  computing  the  depths  of  ob¬ 
jects  from  their  paths  through  the  EPIs.  And  finally,  in 
Section  4  we  discuss  the  strengths  and  weaknesses  of  the 
technique  and  outline  some  current  and  future  directions 
for  our  work. 

Eplpolar-Plane  Images 

In  this  section  we  define  an  epipolar-plane  image  (an 
EPI)  and  explain  our  interest  in  it.  First,  however,  we  re¬ 
view  some  stereo  terminology.  Consider  Figure  5,  which 
is  a  diagram  of  a  general  stereo  configuration.  The  two 
cameras  are  modeled  as  pin-holes  with  the  image  planes 
in  front  of  the  lenses.  For  each  point  P  in  the  scene, 
there  is  a  plane,  called  the  epipolar  plane,  which  passes 
through  the  point  and  the  line  joining  the  two  lens  cen¬ 
ters.  This  p’ane  intersects  the  two  image  planes  along 
epipolar  line*.  All  the  points  in  the  epipolar  plane  a_e 
projected  onto  one  epipolar  line  in  the  first  image  and  a 
corresponding  epipolar  line  in  the  second  image.  The  im¬ 
portance  of  thc3v  lines  for  stereo  processing  is  that  they 
reduce  the  seiirli  required  to  find  matching  points  from 
two  dimentions  to  one.  Thus,  to  find  a  match  for  a  point 
along  one  epipolar  ine  in  an  image  it  is  only  necessary  to 
search  along  the  rot-esponding  epipolar  line  in  the  other 
image.  Thi3  is  ter,r.d  the  epipolar  enmtrcint. 

One  further  definition  that  is  essential  to  understanding 
our  approach  is  that  of  an  epipole.  An  rpipole  in  a  stereo 
configuration  is  the  intersection  of  the  line  joining  the 
lens  centers  and  an  image  plane  (see  Figure  5).  In  mo¬ 
tion  analysis,  an  epipole  is  often  referred  to  as  a  focus  of 
expansion  (FOE)  because  the  epipolar  lines  radiate  from 
it. 


137 


■  T . ' .  r.  v:  ,'v: 


Consider  a  simple  motion  in  which  a  camera  moves  from 
right  to  left,  with  its  optical  axis  orthogonal  to  its  direc¬ 
tion  of  motion  (see  Figure  6).  For  this  type  of  motion 
the  epipoiar  plane  for  a  point,  such  as  P,  is  the  same  foi 
all  pairs  of  camera  positions,  and  we  refer  to  that  plane 
as  the  epipoiar  plane  for  P  for  the  whole  motion. 

The  epipoiar  lines  associated  with  one  of  these  epipo¬ 
iar  planes  are  horizontal  scan  lines  in  the  images  (see 
Figure  6).  The  projection  of  P  onto  these  epipoiar  lines 
moves  to  the  right  as^he  camera  moves  to  the  left.  The 
velocity  of  this  movement  along  the  epipoiar  line  is  a 
function  of  P’s  distance  from  the  line  joining  the  lens 
centers.  The  closer  it  is,  the  faster  it  moves. 

For  this  motion,  the  epipoiar  lines  are  not  only  horizon- 
tal,  they  occur  at  the  same  vertical  position  in  ail  the  im¬ 
ages.  Therefore,  a  horisontal  slice  or  the  spatio-temporal 
data  formed  from  this  motion  contains  all  the  epipoiar 
lines  associated  with  one  epipoiar  plr-ne  (see  Figure  7). 

Figure  7  shows  three  of  the  images  used  to  form  the 
solid  of  data.  Typically  a  hundred  or  more  images  ere 
used,  malting  P’s  trajectory  through  the  data  a  continu¬ 
ous  path,  as  indicated  in  the  diagram.  For  this  type  of 
lateral  motion,  if  the  camera  moves  a  constant  distance 
between  images,  the  trajectories  are  straight  lines  (see 
Appendix  A). 

Figure  8  shows  a  horisontal  slice  through  the  solid  of 
data  shown  in  Figure  3,  which  was  constructed  from  a 
sequence  of  125  images  taken  by  a  camera  moving  from 
right  to  left.  Figure  9  shows  a  frontal  view  of  that  slice. 
We  call  this  type  of  image  an  epipolar-plane  image  (EPI) 
because  it  is  composed  of  one-dimensional  projections 
of  the  world  points  lying  on  an  epipoiar  plane.  Each 
horizontal  line  of  the  image  is  one  of  these  projections. 
Thus,  time  progresses  from  bottom  to  top,  and,  as  ths 
camera  moves  to  the  left,  ths  features  muve  to  the  right. 

There  are  several  things  to  notice  about  this  image.  First, 
it  contains  only  linear  structures.  In  this  respect  it  is 
much  simpler  than  the  spatial  images  used  to  create  it 
(see  Figure  1  for  comparison).  Second,  the  slopes  of  the 
lines  determine  the  distances  to  the  corresponding  fea¬ 
tures  in  the  world.  The  greater  the  slope,  the  farther  the 
feature.  Third,  occlusion,  which  occur*  when  a  closer 
feature  moves  in  front  of  a  more  distant  one,  is  immedi¬ 
ately  apparent  ir,  this  representation.  For  example,  the 
narrow  white  bar  at  the  left  center  of  the  EPI  in  Fig¬ 
ure  9  is  initially  occluded,  then  it  is  visible  for  a  while 
until  ir  is  occluded  briefly  by  a  thin  light  object,  then 
visible  again  before  being  rapidly  occluded  twice  by  two 
darker  objects,  and  then  is  continuously  visible  until  the 
end  of  the  sequence.  Thus,  the  same  object  is  seen  four 
different  times. 

Figure  10  shows  another  EPI  sliced  from  the  data  in 
Figure  3.  Its  basic  structure  is  the  same  as  Figure  9; 
however,  it  illustrates  the  variety  of  patterns  that  can 
occur  in  an  EPI. 

The  F.PIs  in  Figures  9  and  10  were  constructed  from  a 
simple  right-to-left  motion  with  the  camera  oriented  at 
right  angles  to  the  motion.  For  what  other  types  of  mo¬ 
tions  can  EPIs  be  constructed?  The  answer  is  that  they 
can  be  constructed  for  any  straight-iine  motion.  As  long 
as  the  lens  center  of  the  camera  moves  in  a  straight  line 
the  epipoiar  planes  remain  fixed  relative  to  the  scene. 
The  points  in  each  of  these  planes  function  as  a  unit. 
They  are  projected  onto  one  line  in  the  first  image,  an¬ 


other  iine  in  the  second  image,  and  so  on.  The  camera 
can  even  change  its  orientation  about  its  lens  center  ss  it 
moves  along  the  line  without  affecting  this  parti'ioning 
of  the  scene.  Orientation  changes  move  the  epipoiar  lines 
around  in  the  image  plane,  significantly  complicating  the 
construction  of  the  EP’s,  but  the  epipoiar  planes  remain 
unchanged  since  the  line  joining  the  lens  center*  remains 
fixed. 

Figure  11  is  ar  EPI  formed  from  a  sequence  of  images 
taken  by  a  camera  moving  forward  and  looking  straight 
ahead.  Again  the  image  is  very  structured,  except  that, 
instead  of  lines,  it  is  composed  of  curves.  For  this  type 
of  motion,  in  fact  for  any  straight-line  motion  in  which 
the  camera  is  at  a  fixed  orientation  relative  to  the  di¬ 
rection  of  motion  (see  Figure  12),  the  trajectories  in  the 
EPI't  are  hyperbolas  (see  Appendix  B).  But  not  only  are 
they  hyperbolas,  they  are  simple  hyperbolas  in  the  sense 
that  their  asymptotes  arc  vertical  and  horizontal  lines. 
A  right-to-left  motion,  such  as  the  one  mentioned  above, 
is  just  a  special  case  in  which  the  hyperbolas  degenerate 
into  lines. 

If  the  lens  center  does  not  move  in  a  line,  ths  epipo¬ 
iar  planes  passing  through  a  world  point  differ  from  one 
camera  petition  to  the  next.  The  points  in  the  scene  are 
grouped  one  way  for  one  pair  of  camera  positions  and  a 
different  way  for  another  pair  of  positions.  This  makes 
it  impossible  to  partition  the  scene  into  a  fixed  set  of 
planes,  which  in  turn  means  that  it  is  not  possible  to 
construct  F.PIs  for  such  a  motion. 

One  last  observation  about  EPIs:  since  an  EPI  contains 
all  the  information  about  ths  features  in  a  slice  of  the 
world,  the  analysis  of  a  scene  can  be  r  -  rationed  into  a 
set  of  analyses,  one  for  each  alice.  In  the  case  of  a  right- 
to-left  motion,  thc.  i  is  one  analysis  for  each  scaniins  in 
the  image  sequence.  This  ability  to  partition  the  anal¬ 
ysis  is  ons  of  the  key  properties  of  our  motion-analysis 
technique.  Slices  of  ths  spatio-temporal  data  can  be  ana¬ 
lyzed  independently  (and  possibly  in  parallel),  and  then 
the  results  can  be  combined  into  a  three  dimensional  rep¬ 
resentation  of  the  scene. 


Experimental  Results 

We  have  implemented  a  program  that  computes  thiee- 
dimensional  locations  of  world  features  by  analyzing  EPIs 
constructed  from  right-to-left  motions.  The  program 
currently  consists  of  the  following  steps: 

1.  3L  smoothing  of  the  spatio-temporal  data 

2.  Slicing  the  data  into  EPIs 

3.  Detecting  edges,  peaks,  and  troughs 

4.  Segmenting  edges  into  linear  features 

5.  Merging  collinear  features 

6.  Computing  x-y-z  coordinates 

7.  Building  a  map  of  free  space 

8.  Linking  x-y-s  pou.is  between  EPIs 

In  this  section  we  illustrate  the  behavior  of  this  program 
by  applying  it  to  the  data  shown  in  Figure  3. 

The  first  step  smooths  the  three-dimensional  data  to 
reduce  the  effects  of  noise  and  camera  jitter,  and  to 
determine  the  temporal  contours  subsequently  to  be 
used  ss  features.  This  is  done  by  applying  a  sequence 
of  three  one-dimensional  Caussiruis  ([Buxton  1983]  and 
[Buxton  1985|  explore  other  uses  of  spatio-temporal  con¬ 
volution). 


f  '* 


1.38 


The  second  step  forms  EPIs  from  the  spatio-temporal 
data.  For  a  lateral  motion  this  is  straightforward  because 
the  EPIs  are  horizontal  slices  of  the  data.  Figure  9  shows 

tho  EPI  selected  to  iilui  '.rate  steps  three  through  seven. 

a.  The  third  step  detects  edge-like  features  in  the  EPI.  It 
curresitiy  locates  four  types  of  features:  positive  and  neg¬ 
ative  zero-crossings  hlarr  19801  and  peaks  and  troughs 
in  the  difference  of  Gaussian*.  The  zero-crcsrings  indi¬ 
cate  places  in  the  EPI  where  there  is  a  sharp  change  in 
image  intensity,  typically  at  surface  boundaries  or  surface 
markings,  and  the  peaks/troughs  occur  between  these 
*  zero-crossings.  The  former  are  generally  more  precisely 
*  positioned  than  the  latter.  Figure  13  shows  all  four  types 
of  features  detected  in  the  EPI  shown  in  Figure  9. 

The  fourth  step  fits  linear  segments  to  the  edges.  It 
does  this  in  two  passes.  The  Gist  pass  partitions  the 
,  edges  at  sharp  corners  by  analyzing  curvature  estimates 
along  the  edges.  The  second  pass  applies  Rainer's  algo¬ 
rithm  IRamer  1972)  to  recursively  partition  the  smooth 
segments  into  line  segments.  Figure  14  shows  the  line 
segments  derived  from  the  edges  in  figure  13. 

The  fifth  step  builds  a  description  of  the  line  segments 
that  links  together  those  that  are  collinear.  The  intent  is 
to  identify  sets  of  lines  that  belong  to  the  same  feature 
in  the  world.  By  bridging  gaps  caused  by  occlusions, 
the  program  can  improve  its  estimates  of  the  features' 
locations  as  well  as  extract  clues  about  the  nature  of  the 
surfaces  in  the  scene.  The  program  only  links  together 
features  of  the  same  type,  except  that  positive  and  neg¬ 
ative  zero  crossings  may  be  joined,  since  the  contrast 
across  an  edge  can  differ  from  one  view  to  the  next.  Fig¬ 
ure  15  shows  the  peak  features  from  Figure  14  that  are 
linked  together  by  the  program. 

The  line  intersectio>is  in  Figures  14  and  15  indicate  tem¬ 
poral  occlusions.  For  each  intersection,  the  feature  with 
the  smaller  slope  is  the  one  that  occludes  the  other. 

The  sixth  step  computes  the  >;-y-s  locations  of  the  world 
features  corresponding  to  the  EPI  features.  The  world 
coordinates  are  uniquely  determined  by  the  location  of 
the  epipolar  plane  associated  with  the  F.PI  and  the  slope 
and  intercept  of  the  line  in  the  EPI.  To  display  these 
three-dimensional  locations,  the  program  plots  the  two- 
dimensional  coordinates  of  the  features  in  the  epipolar 
plane.  Figure  16  shows  the  epipolar  plane  coordinates  for 
the  features  shown  in  Figure  14.  The  shape  and  size  of 
each  ellipse  depicts  the  error  associated  with  the  feature's 
location  (this  depends  on  the  length  of  the  line  and  the 
14  variance  of  the  fit). 

The  seventh  step  builds  a  two-dimensional  map  of  the 
world  that  indicates  which  areas  are  empty  (also  see 
[Biidwell  19831).  This  construction  is  demonstrated  for 
a  few  points  in  Figure  17,  where  the  Z  =  0  axis  is  the 
,  camera  path.  The  principle  here  is  that  if  a  feature  is 
-  seen  continuously  over  some  interval  by  a  moving  cam¬ 

era,  then  during  that  motion  nothing  occludes  it.  Since 
nothing  occludes  it,  nothing  lies  in  front  of  it,  and  the 
triangle  in  the  scene  defined  by  the  feature  and  its  first 
and  last  points  of  observation  is  empty  space.  We  build 
a  map  of  this  free  space  by  constructing  one  of  these  tri- 
angu'ar  regions  for  each  line  segment  found  in  the  EPI, 
and  then  ORing  these  all  together.  Notice  in  Figure  17 
that  one  of  the  features  is  viewed  once  while  the  other 
is  seen  in  two  distinct  intervals,  and  so  gives  rise  to  two 
free-space  triangles.  Figure  18  shows  the  full  free-space 
map  constructed  for  the  EPI  features  of  Figure  16.  To 


take  the  INTERSECTION  of  the  free-space  maps  from 
these  individual  EPIs,  perhaps  over  some  verticil  inter¬ 
val,  would  give  us  the  knewn  free-space  volume  in  that 
interval.  This  would  be  useful  for  navigation,  as  we  know 
we  could  move  freely  La  that  volume  without  running  into 
obstacles. 

Figures  19  threugh  22  show  the  processing  far  the  EPI 
30  lines  from  the  bottom  of  the  uv  images.  This  slice 
contains  a  plant  on  the  left,  a  shirt  draped  over  a  chair, 
part  of  the  top  of  a  table,  and  in  the  right  foreground,  a 
ladder. 

Figure  23  is  a  stereo  (emssed-eye)  display,  showing  some 
preliminary  results  in  the  eighth  step  of  our  analysis  - 
combining  the  spatial  data  from  the  individual  EPIs. 
For  spatial  continuity,  we  link  points  between  tbe  various 
EPIs  (nearest  neighbors  in  overlapping  error  ellipses). 
Figure  23  displays  those  features  whose  total  baseline 
is  greater  than  3  inches,  and  whose  connected  extent 
vertically  is  greater  than  2  scanlines. 

As  u  final  depiction  of  our  results  in  this  depth  from  cam¬ 
era  motion  analysis,  we  show  in  Figure  24  all  mapped  fea¬ 
tures  visible  from  the  90th  frame  of  the  sequence.  Fig¬ 
ure  25  shows  these  features  separated  into  the  specific 
depth  ranges  indicated. 


PlKUHiOB 

The  following  valuable  characteristics  of  this  approach 
should  be  noted: 

•  Spatial  and  temporal  data  are  treated  together  as 
a  single  unit; 

•  The  acquisition  and  tracking  steps  of  the  conven¬ 
tional  motion  analysis  paradigm  are  merged  into 
one  step; 

•  The  approach  is  feature- based,  but  is  not  restricted 
to  point  features  -  linear  features  that  are  perpen¬ 
dicular  to  the  direction  of  motion  can  also  be  used, 

•  There  is  more  structure  in  an  EPI  than  in  a  *tas- 
dard  spatial  image,  which  means  that  it  is  easier 
to  analyse,  and  hence  easier  to  interpret; 

•  Occlusion  is  manifested  in  an  EPI  in  a  way  that  in¬ 
creases  the  chance  of  detection  becauae  the  edge  is 
viewed  over  time  against  a  variety  of  backgrounds; 

•  EPIs  facilitate  the  segmentation  of  a  scene  into 
opaque  objects  occurring  at  different  depths  be¬ 
cause  they  encode  a  Aomopeneo us  slice  of  the  object 
over  time; 

•  There  are  some  obvious  ways  to  make  the  analysts 
incremental  in  time  and  partitionable  in  y  (epipo¬ 
lar  planes),  for  high  speed  performance. 

With  these  benefita,  the  inherent  limitations  and  current 
restrictions  must  be  borne  in  mind: 

•  Motion  must  be  in  a  straight  line  and  (currently) 
the  camera  must  be  at  a  fixed  angle  relative  to  the 
direction  of  motion; 

•  Frame  rate  must  i  e  high  enough  to  limit  the.  frame- 
to-frame  changes  to  a  pixel  or  so  (more  specifi¬ 
cally,  such  that  the  projective  width  of  a  surface 
is  greater  than  its  motion); 

•  Ir.dopendcatly  moving  objects  will  either  not  be  de¬ 
tected,  or  will  be  detected  inaccurately. 


1  39 


/  / 


We  are  currently  investigating  the  following  area/i: 

•  Extending  our  analysis  of  connectivity  between  ad¬ 
jacent  EPIs  -  this  seem*  to  be  be*t  handled  by  not 
losing  the  information  in  the  Sr*t  place,  that  ia,  by 
making  explicit  the  feature  connectivity  in  ipace 
as  well  aa  time. 

•  Identifying  and  interpreting  apatial  and  temporal 
phenomena  such  aa  occlus'ona,  ahadowa,  mirrors, 
and  highlight*. 

•  Characterizing  the  appearance  of  curved  surface* 
in  EPIs. 

•  Implementing  the  analyaia  of  EPIa  derived  from 
forward  motion*. 

•  Providing  more  dense  depth  information  by.  for  ex¬ 
ample,  tracking  intensity  levels. 

«  Making  the  analysis  incremental  in  T,  rather  than 
V  -  that  is,  processing  spatial  imago  over  time, 
rather  than  requiring  the  acquisition  of  all  images, 
and  then  processing  by  EPI  slice*. 


Appendix  A.-  Lateral-Motion  Trajectories 
In  this  appendix  we  first  derive  an  equation  for  the  tra¬ 
jectory  of  a  point  in  an  EPI  constructed  from  a  lateral 
motion,  and  then  ahow  how  to  compute  the  (x,yz)  lo¬ 
cation  of  auch  a  point.  Figure  2#  is  a  diagram  of  a  tra¬ 
jectory  in  an  EPI  derived  from  the  right- to-left  motion 
illustrated  in  Figure  27.  The  scanlin*  at  ti  In  Figure  2ft 
corresponds  to  the  epipolar  line  fi  in  Figure  27.  Simi¬ 
larly,  the  scanline  at  tj  corresponds  to  the  epipolar  line 
1,.  (Recall  that  the  EPI  is  constructed  by  extracting  on* 
line  from  each  image  taken  by  the  camera  a a  it  moves 
along  th*  line  joining  ci  and  e».  Sine*  th*  image*  are 
taken  very  close  together  in  time,  there  would  be  several 
images  taken  between  ct  and  Cj.  However,  to  simplify 
the  diagram  none  of  these  is  shown.l  The  point  (u,,  ti) 
in  the  EPI  corresponds  to  the  point  (ui,V|)  in  the  image 
taken  by  the  camera  at  tims  ti  and  position  et.  'rl'u *, 
as  the  camera  moves  from  e>  to  Cj  in  ths  time  intei  cal 
t,  to  tj,  the  scene  point  moves  in  the  EPI  from  (u;  t:) 
to  ltti.tr).  Th*  intent  of  this  section  is  to  characterize 
the  sh  -t  of  this  trajectory  and  then  compute  the  three- 
dimensional  position  of  the  corresponding  scene  point, 
given  the  focal  length  of  the  camera,  the  camera  speed, 
and  the  coordinates  of  points  along  the  trajectory. 

For  our  analysis  we  define  a  left-handed  coordinate  sys¬ 
tem  that  is  cente-ed  on  the  initial  position  of  the  camera 
(i.e.,  e i  in  Figure  27).  The  shape  of  the  trajectory  can 
be  derived  by  analyzing  the  geometric  relationships  in 
•he  epipolar  plane  that  passes  through  P.  t  -re  28  is  a 
diagram  of  that  plane. 

Given  the  speed  of  the  camera,  s,  which  is  assumed  to 
be  constant,  the  distance  from  C|  to  c>,  Ax,  can  be  com¬ 
puted  as  follows: 

Az  =  i  Al  (1) 

where  A t  is  (t,  -  t|).  By  similar  triangles 


K  ~  D 

uj  _  Az  +  z 

T  ~  ~D 


(2) 

(3) 


where  ut  and  uj  have  been  ci  averted  from  pixel  values 
Into  distances  on  the  image  plane,  h  is  the  distance  frem 
the  lens  center  to  the  epipolar  line  in  the  image  plane,  z 
is  the  x-coordinate  of  P  in  the  scene  coordinate  system, 
and  D  is  the  distance  from  P  to  the  line  joining  the  lens 
centers.  Since  h  is  the  hypotenuse  of  a  right  triangle,  it 
can  be  computed  as  follows: 

K  =  \Jp  +  vf,  (4) 

where  /  is  the  focal  length  of  the  camera.  From  2  and  3 
we  get 


Au  =  (uj  -*i)  = 


h(  Aa  +  x) 
D 


(S) 


Thus,  As  is  a  linear  function  of  Ax.  Since  At  is  also  a 
linear  function  of  Ax,  At  is  linearly  rela-ed  to  Au,  which 
means  that  trajectories  in  an  EPI  derived  from  a  lateral 
motion  are  straight  lines. 


The  (x,y,s)  position  of  P  can  be  imputed  by  scaling 
*i,  »i,  and  /  appropriately.  From  5  we  defin* 


m  =- 


D 

h 


Ax 

Au 


(«) 


which  represents  the  slope  of  ths  trajectory  computed 
In  terms  of  the  distance  traveled  by  the  camera  (Ax  as 
opposed  to  At)  and  the  distas'*  ths  point  moved  along 
the  epipolar  line  (i.e.,  Au).  From  similcr  triangles 


which  that 

(x.y.s)  *  (mu, ,mr,,r*/)  (8) 

If  the  first  camera  position,  e,,  on  an  observed  trajectory 
is  diffirent  from  the  camera  position,  c*,  that  defines  a 
global  camera  coordinate  system,  the  x  coordinate  can 
be  adjusted  by  an  amount  equal  to  th*  distancs  traveled 
from  Co  to  ct.  Thus, 


(x.y,  *)  »  ((<i  -  to)*  +  mui,mo,,m/)  (9) 

where  ta  is  the  time  of  the  first  image  and  s  is  the  speed  of 
the  camera.  This  correction  is  equivalent  to  computing 
the  x  intercept  of  the  line  and  using  it  as  the  first  camera 
position.  Therefore,  for  a  lateral  motion,  the  trajectories 
are  linear  and  the  (z,y,  z)  coordinates  of  the  points  can 
be  easily  computed  from  the  slopes  and  intercepts  of  the 
lines. 


Appendix  B:  Forward-Motion  Trajectorlee 
The  derivation  of  the  form  of  a  trajectory  produced  by 
a  forward  motion  is  similar  to  the  one  used  for  lateral 
motion.  Figure  29  is  a  diagram  of  a  trajectory  in  an  EPI 
derived  from  a  sequence  of  image*  taken  by  a  camera 
moving  in  a  straight  iin*  at  a  axed  orientation  relative  to 
the  principal  axis  of  the  camera  (**.  Figure  30).  Without 
loea  of  generality  we  have  rotated  the  image  plane  coor¬ 
dinate  systems  in  a  uniform  way  so  that  the  epipole*  are 
on  the  u  axes.  The  DPI  in  Figure  29  wa*  constructed 
by  extracting  pixel  intensities  along  epipolar  lines  in  the 
images  shown  in  Figure  30  and  inserting  them  us  scan- 
lines  in  Figure  29.  For  example,  epipolar  line  ft  was 
placed  at  t|,  lj  was  placed  at  (i,  and  so  on.  The  point 
(wi,*i)  in  the  EPI  correspond;  to  the  point  (ui.vi)  in 
the  image  taken  at  time  (i  tjt id  position  C|.  Thus,  as 
the  camera  move*  from  C|  io  cj  over  the  time  interval 
t)  to  tj,  the  scene  point  move*  in  the  EPI  from  (wt,ti) 


140 


to  {«■•},  1 1).  Our  goal  is  to  characterize  the  shape  of  this 
trajectory,  and  then  compute  the  three-dimensional  po¬ 
sition  of  the  corresponding  scene  point,  given  the  focal 
length  of  the  camera,  the  camera  speed,  the  angle  be¬ 
tween  the  camera's  axis  and  the  direction  of  motion  (0), 
*  and  the  coordinates  of  points  along  the  trajectory. 

As  before,  we  define  a  left-han  led  coordinate  system  that 
is  centered  on  the  initial  posi  ion  of  the  camera  [i.e.,  e, 
in  Figure  30).  The  shape  zt  the  trajectory  can  be  derived 
by  examining  the  geometric  relationships  in  the  epipolar 
plane  that  passes  through  P.  Figure  31  is  a  diagram  of 
that  plane. 


Given  the  speed  of  the  camera,  s,  which  is  assumed  to 
be  constant,  the  distance  from  e,  to  cJt  A e,  can  be  com¬ 
puted  at  follows: 


A  e  =  s  At 


where  At  is  (tj  -  H).  By  similar  triangles 
w\  _  C 

T  ~  7 

and 


uij  C 
h  (e  -  Ae) 


(10) 

(11) 

(12) 


where  wt  and  u-j  are  distances  on  the  image  plane,  A 
is  the  distance  from  the  lens  center  to  the  epipole,  C  is 
the  distance  from  P  to  the  line  joining  the  lens  centers 
measured  in  a  plane  parallel  to  the  image  planes,  and 
e  is  the  distance  along  the  line  joining  the  lens  centers 
from  ci  to  the  plane  passing  through  P  and  parallel  to 
the  image  planes.  Since  h  is  the  hypotenuse  of  a  right 
triangle  (see  Figure  30),  it  can  be  computed  as  follows: 


A  = 


cosd 


(13) 


where  /  is  the  focal  length  of  the  camera.  From  11  and 
12  we  get 


Aw  =  (uij  -  uq) 


AC  h-C  _  kCAc 

(e  -  Ac)  e  e(e  -  Ac) 


we  get 


or 


Notice  that  it  is  N’OT  necessary  to  determine  the  co¬ 
efficients  of  the  hyperbola  in  order  to  compute  z.  Two 
points  on  the  trajectory  are  sufficient  to  compute  At  and 
Aw,  which  in  turn,  are  sufficient  to  compute  *.  Also  no¬ 
tice,  however,  that  it  is  easy  to  fit  an  hyperbola  of  this 
type  because  it  is  in  the  simple  form 

AuiAt  +  aAui  +  6A1  ~  0,  (21) 

which  is  linear  with  respect  to  the  coefficients  a  and  A. 
This  type  of  fitting  provides  a  way  to  increase  the  preci¬ 
sion  with  which  the  scene  points  are  located. 

The  expression  for  x  in  Equation  20  does  not  apply  when 
8  -  90°,  but  that  is  the  lateral  motion  case  covered 
earlier.  Thus,  the  trajectories  are  always  hyperbolas; 
they  just  happen  to  degenerate  into  streight  lines  when 
8  =  90°,  which  corresponds  to  the  case  in  which  the 
epipoles  are  not  in  the  image  plane,  but  rather  lie  at 
infinity. 

The  x  and  y  coordinates  for  P  can  be  computed  by  scal¬ 
ing  x  appropriately: 

(*.V)  =  ( jx.yx)  (22) 

Recall  that  u\  and  tq  are  measured  in  a  rotated- image- 
plane  coordinate  system  that  was  set  up  to  place  the 
epipole  on  the  u  axis.  Therefore,  in  addition  to  convert¬ 
ing  pixel  values  to  a  standard  metric,  such  as  meters, 
the  image  coordinates  of  a  point  must  be  rotated  about 
the  principal  axis  before  they  can  be  inserted  into  Equa¬ 
tion  22.  To  compute  a  world-centered  position  for  P,  the 
{x,il,t)  position  computed  by  Equations  20  and  22  has 
to  be  transformed  for  the  initial  position  of  the  camera 
along  the  path. 

References 


Aw  = 


wj  eos6Ac 


Wj  cosS  s  At 
Aw 


(19) 

(20) 


which  can  be  re-written  as 

e  Aw  Ae  —  e’Atir  +  hCAc  =  0.  (15) 

Using  10  to  express  Ae  in  terms  of  At,  this  becomes 

sc  Aw  At  —  e1Aw  +  sHC  At  —  0.  (16) 

which  defines  a  hyperbola  whose  asymptotes  are  the  lines 
w  =  0  and  t  =  e/s  (see  Figure  32).  Thus,  the  trajectory 
is  a  hyperbola  in  which  the  point  P  appears  arbitrarily 
close  to  the  epipole  when  the  camera  is  far  away  from 
it  (as  one  would  expect),  and  the  projection  of  P  moves 
away  from  the  epipole  at  an  increasing  rate  as  the  camera 
gets  closer  to  it.  This  relationship  agrees  intuitively  with 
the  fact  that  a  projective  transformation  involves  a  I  ft 
factor,  which  males  u  a  hyperbolic  function  of  t. 

Equation  14  can  be  used  to  compute  z.  First,  rewrite  it 
as  follows: 

A*  =  (T^yA‘  (,7> 

Then  using  Equation  12  and 


(Barnard  1980]  ‘Disparity  Analyse  of  Images,’  S.  T 
Barnard  and  W.  B.  Thompson,  IEEE  Trans.,  PAMI, 
Vol  2.  No  4,  1980. 

(Bridwell  19831  ‘A  Discrete  Spatial  Representation  for 
Lateral  Motion  Stereo,"  N.  J.  Bridwell  and  T.  S. 
Huang,  Computer  Vision,  Graphics,  and  Image  Pro¬ 
cessing,  Vol  21,  1983. 

(Buxton  1983|  "Monocular  Depth  Perception  from  Op¬ 
tical  Flow  by  Space  Time  Signal  Processing,"  B.  F. 
and  Hilary  Buxton,  Proceedings  of  the  Royal  Society 
of  London,  3.,  Vol  2X8,  1983. 

jDuxton  1985j  “Computation  of  optic  flow  from  the  mo¬ 
tion  of  edge  features  in  image  sequences,"  B.  F.  and 
H.  Buxton,  Image  and  Vision  Computing,  Vol  2, 
No  2,  May  I'Jdi. 

, Haynes  19S3|  “Detection  of  Moving  Edges,"  S.  M.  Haynes 
and  R.  Jain,  Computer  Vision.Graphtcs,  and  Image 
Processing,  Vol  21,  No  3,  1983. 

Hildreth  1984|  “Computations  Underlying  the  Measure¬ 
ment  cf  Visual  Motion,"  E.  C.  Hildreth,  Artificial 
Intelligence,  Vol  23,  1984. 

[Marr  1980]  “Theory  of  Edge  Detection,"  D.  C.  Mart 
and  E.  Hildreth,  /’roceedinjj  of  the  Royal  Society  o[ 
London,  B  207,  1980. 


i* 


[Nevatia  1976]  “Depth  Measurement  from  Motiou  Ste¬ 
reo,*  Ramaicant  Nevatia,  computer  Graphics  and 
Image  Processing,  5,  1976. 

[Ramer  1972]  “An  Iterative  Procedure  for  the  Polygonal 
Approximation  of  Plane  Curve*  *  U.  Ramer,  Com  ¬ 
puter  Graphics  and  Image  Procersing  Vol  i,  1972. 


[U'lman  1979]  The  Interpretation  of  VtstuJ  Motion,  S. 

UUman,  MIT  Press,  Cambridge,  Mass.,  1979. 

[Yen  1983]  “Detenr:i-:rig  3-D  Motion  and  Structure  of 
a  P.igid  Body  the  Spherical  Projection,’  B. 

L.  Yen  and  T.  S.  Huang,  Computer  Graphics  and 
Image  Protesting,  Vol  21,  1983. 


Fig.  2.  Close  sampling  image  separation 


U-r  UdttU  for  I lla  Y.Hl 


Fig.  13.  Edge  features  in  EP1 


Fig.  15.  Linked  peak  lines 


Fig.  14.  Straight  lines 


Fig.  16.  xc  locations 


Fig.  17.  Free  space  for  2  features 


Fig.  IS.  Free  space 


145 


S 


■ad  matched  features 


Fig.  26.  Lateral  motion  EPI 


Fig.  27.  Lateral  motion  geometry 


Fig.  30.  Forwwd  motion  geometry 


Diaunu  from 
th»  EplpoU 


Fig.  32.  Asymptote*  for  the  hyperbola 


SRI’s  Baseline  Stereo  Sy3tem 

Marsha  Jo  Hannah 

Artificial  Intelligence  Center,  SRI  International 
333  Ravenawood  Are,  Menlo  Park,  CA  $4025 


Abstract 

We  have  implemented  a  baseline  eastern  for  automated,  area- 
baaed  stereo  compilation.  Thu  system,  STEREOSYS,  operates 
in  several  passes  over  the  data,  during  which  it  iteratively  builds, 
checks,  and  refines  its  model  of  the  3-dimensional  world,  as  rep¬ 
resented  by  a  pair  of  images.  In  this  paper,  we  describe  the 
components  of  STEREOSYS  and  give  examples  of  the  results  it 
produces.  We  find  that  these  results  agree  quite  well  with  the 
best  available  benchmark — results  produced  on  the  interactive 
DIMP  system  at  the  U  S.  Army  Engineer  Topographic  Labora¬ 
tories. 

1  Introduction 

Automatic  techniques  for  the  production  or  3-dii.nensionaJ 
data  via  stereo  compilation  are  receiving  increased  interest  for 
a  variety  of  applications,  including  cartography  [Pantcn  I978|, 
autonomous  vehicle  navigation  [Hannah,  I980|,  and  industrial 
automation  [Nishihara  and  Poggio,  10831.  Conventional  stereo 
compilation  techniques,  which  are  based  on  area  correlation,  can 
produce  incorrect  results  under  a  variety  of  conditions,  for  ex¬ 
ample,  when  views  are  widely  separated  ip  space  or  time,  in  the 
vicinity  of  partial  occlusions,  in  featureless  or  noisy  areas,  and  in 
the  presence  of  repeated  patterns. 

We  are  ir-'eatigating  ways  to  overcome  these  inadequacies. 
Our  research  strategy  is  first  to  implement  a  baseline  system  that 
performs  conventional  stereo  compilation,  then  to  replace  pieces 
of  the  system  with  improved  modules  as  we  develop  them  Thus, 
our  baseline  system  forms  the  core  of  an  ever-improving  stereo 
system.  We  have  also  tested  the  baseline  system  Hannah,  1985- 
aj  against  a  ’challenge  data  base*  'Hannah,  I985-b[  of  image 
areas  where  conventional  stereo  techniques  encounter  difficulty 
As  cunently  implemented,  our  system  in  ludes  routines  to 
perform  the  following  operations  automatically: 
s  Cor  struct  hierarchies  for  stereo  images 
s  Seh  ct  ’interesting*  points  for  sparse  matching 
s  Search  2D  regions  for  sparse  matches 

s  If  necessary  for  uncalibrated  imagery,  compute  rels.ive 
camera  parameters  from  sparse  matches 

*  Compute  epipolsr  linee 

e  Locate  epipoler  malchee,  using  disparity  estimates  from 
sparse  matches  when  available 


s  Evaluate  matched  points  for  believability 

e  Interpolate  between  matched  points 

s  Display  images  and  results  in  left-right  stereo,  red-green 
stereo,  or  as  s  monocula*  disparity  field 

s  Compute  range  data  and  x-y-s  coordinates  for  matched 
point  pairs 

e  Display  terrain  data  in  perspective  with  bidden  lines  re¬ 
moved. 

Ws  are  currently  exploring  improved  techniques  for  image  match¬ 
ing  and  match  evaluation. 

2  The  Stereo  System 

SRI  has  integrated  existing  pieces  of  stereo  coda  into  a  base¬ 
line  system  for  automated  area-based  stereo  compilation,  then 
improved  the  system  to  its  present  form.  The  system  operates  in 
saveral  passes  over  the  data,  during  which  it  iteratively  builds, 
cbacks,  and  refines  its  model  of  the  3-dimensional  world  reprs- 
sented  by  s  pair  of  images. 

The  driving  program  is  called  STEREOSYS  (STEREO  SYS- 
tem)  It  allows  the  user  to  invoke  s  variety  of  modules  to  per¬ 
form  tbe  necessary  processing  for  stereo  compilation.  In  theory, 
the  modules  are  independent  and  can  be  replaced  with  improved 
versions  at  will,  in  practice,  there  are  some  unavoidable  inter¬ 
dependencies  of  global  variables  that  will  have  to  be  attended 
to 

The  following  sections  describe  the  componenti  of  STEREO¬ 
SYS  in  the  order  they  are  normally  invoked,  examples  of  their 
. i-suIls  are  included.  Comment!  are  also  made  as  to  improve- 
nwnu  that  could  be  made  to  each  of  the  modules. 

2.1  Preliminary  Processing 

Before  the  actual  stereo  matching  can  begin,  some  prelimi¬ 
nary  image  processing  is  necessary.  This  includes  the  creation  of 
the  image  merarchy  and  the  selection  of  the  interesting  points  to 
be  matched. 

2.1.1  Creating  tbe  Image  Hierarchy 

The  basis  for  the  image  matching  techniques  is  a  hierarchy  of 
images,  as  shown  in  Figure  1  The  module  of  STEREOSYS  that 
forms  (his  hierarchy  from  the  original  images  is  called  REDUCE 
In  the  example  used  for  the  figures,  the  original  images  are  a  pair 
of  image  ’chips*  digitized  from  standard  9"  v  9*  mapping  pho 


e 


r 

ike 


f 

i. 


;  -*♦  '> 


o 


rr** 


►/  *■ 


Cl 

•  *- 


H 


a 


pc 


a 

f 

t’ 

Cv 

p  .* 

>  * 

c 


t  *  • 

l’-. 

u*. 


t . 


to*  taken  over  Phoenix  South  Mountain  Park,  n*ar  Guadalupe 
(a  auburb  of  Phoenix),  Arixona.  The**  image*  are  2048  x  2048 
pixel*  in  aixe,  and  cover  an  area  that  i*  approximately  2  kilo¬ 
meter*  square  on  the  ground;  elevation*  in  th*  area  rang*  from 
360  to  S40  meter*.  The  reduction  hierarchy  coniiat*  of  a  pyra¬ 
mid  of  images,  each  at  half  of  the  resolution  of  it*  parent;  in  this 
case  REDUCE  produce*  pair*  of  image*  that  are  1024  x  1024, 
S12  x  512,  256  x  256,  128  x  128,  64  x  64,  32  x  32,  and  16  x  16 
pixels  in  six*.  (Figure  1  shows  only  th*  256  x  256  through  18  x  16 
mage  pairs.) 

REDUCE  ordinarily  produce*  pixels  in  each  reduced  image  by 
convolving  th*  image  with  a  Gaussian,  than  sub-sampling  [Burt, 
1981 1 .  Older  cods  also  exist*  to  reduce  image*  by  simple  averag¬ 
ing  of  the  pixels  in  an  N  x  N  square  from  th*  next-largest  image 
(in  moat  cases,  N=2).  It  is  known  that  this  technique  can  pro¬ 
duce  artifact*  in  the  data,  and  th*  more  sophisticated  Gaussian 
technique  is  preferred. 

2.1.2  Selecting  Interesting  Points 

The  6r*t  step  in  th*  matching  proca**  is  to  procure  a  set 
of  well-scattered,  reliable  matches  in  th*  image.  Our  approach 
is  first  to  select  areas  in  o;i*  image  that  contain  sufficient  in¬ 
formation  to  produce  reliable  matches.  To  accomplish  this,  a 
statistical  operator  baaed  on  image  variance  and  edge  strength  is 
passed  over  th*  image;  local  peaks  in  th*  output  of  this  opera¬ 
tor  are  recorded  as  th*  preferred  places  to  attempt  th*  matching 
process 

Historically,  such  operators  have  been  called  tstercsl  oper¬ 
ators,  and  th*  peaks  in  th*  operator  output  have  been  called 
latercsfinf  fox  els  [Moravec,  1980|  This  nomenclature  is  some¬ 
what  misleading,  as  th*  points  selected  are  rarely  interesting  to 
a  human  observer;  however,  then*  terms  have  been  in  us*  in  the 
computer  vision  community  for  over  10  years.  It  should  b#  noted 
that  present  interest  operators  are  not  feature  detectors,  the  same 
operator  rue  over  both  images  of  a  stereo  pair  wiil  not  naceasarily 
pick  out  the  same  points  in  tha  two  images.  In  our  system,  the 
interest  operator  is  run  in  only  on*  of  th*  images,  whers  it  select* 
points  that  are  to  be  matched  in  the  second  image  by  various  cor¬ 
relation  techniques.  (A  possible  enhancement  to  STEREOSYS 
would  be  to  design  and  implement  efficient  interest  operators  that 
really  do  choose  ‘interesting*  points,  such  as  crossroads,  building 
comers,  sharp  bends  in  rivers,  etc  ) 

The  module  INTEREST  permits  th*  user  to  specify  th*  in¬ 
terest  operator  to  be  used  Hannah,  1980],  th*  window  six*  over 
which  it  is  calculated,  and  the  minimum  spacing  for  interesting 
points  It  also  provides  the  capability  to  divide  the  image  into 
a  grid  of  subimsges,  and  records  th*  relative  ranke  of  th*  inter¬ 
esting  points  within  their  grid  cells;  this  permit*  the  most  inter¬ 
esting  point(s)  in  each  ares  to  be  matched  first  Figure  2  shows 
th*  interesting  point*  for  the  right  image  of  the  Phoenix  pair; 
th*  number*  indicate  the  let,  2nd,  3rd,  and  4th  moat  interesting 
points  m  •  6  x  6  grid  of  cells 

2.2  Prel'i.iinary  Matching 

At  this  point  in  the  processing,  it  is  possible  to  tske  one  of 
two  different  spprosenee  to  the  matching  If  nothing  is  known  re¬ 
garding  the  absolute-  camera  positions  and  orientations  (as  would 
be  the  case  for  a  r.tereo  pair  taken  with  handheld  can.-raa),  an 
unstructured  hierarchical  matching  algorithm  is  used  on  th*  most 
interesting  points  The  results  of  these  matches  are  used  in  seek¬ 


ing  a  solution  for  a  simplistic  relativ*  camera  model  (5  angles 
describing  th*  relativ*  positions  and  orientations  of  2  ideal  pin¬ 
hole  cameras  [Hannah,  1974]),  which  can  then  be  used  for  the 
epipolar  constraint  in  further  matching.  On  the  other  hand,  if 
th*  camera  parameters  are  known  (as  would  be  the  caie  for  th* 
highly  calibrated  cartographic  stereo  images  ntended  for  terrain 
mapping)  matching  can  proceed  directly  with  the  epipolar  con¬ 
straint#. 

7.2.1  Unconstrained  Hierarchical  Matching 

Unconstrained  hierarchical  matching  is  done  by  the  module 
HMATCH  4MATCH  assumes  that  nothing  is  known  about  th* 
relative  orientations  of  the  images,  other  than  that  they  cover 
approximately  the  stuns  area,  at  about  tbs  tarns  teals,  with  no 
major  rotation  between  th*  images.  It  matches  each  specified 
point  (usually  th*  moat  interesting  point  in  each  grid  call)  using 
an  un guided  hierarchical  matching  technique  similar  to  that  re¬ 
ported  in  Moravec  [1980|.  Thia  technique  begins  with  th*  point 
in  th*  largest  image  (the  2048  x  2048  right  image  of  th*  Phoenix 
set),  traces  it  back  through  that  image's  hierarchy  (in  our  ex¬ 
ample,  it  repeatedly  halves  th*  co-ordinate*  of  th*  point)  until 
it  reaches  an  image  that  is  approximately  tbs  six*  of  th*  cor¬ 
relation  window  (th*  16  x  18  image  for  th*  11  x  11  correla¬ 
tion  windows  that  we  used).  It  then  uses  s  2-dimensional  spiral 
search,  followed  by  a  hill-climbing  aecruh  for  tb*  maximum  of  th* 
normalised  cross  correlation  between  th*  image  window*  [Quam, 
1971],  Thia  global  match  ia  then  refined  back  down  th*  image 
hierarchy;  that  is,  th*  disparity  at  each  level  (suitably  magni¬ 
fied  to  account  for  relativ*  image  scales)  is  used  as  a  starting 
point  for  a  bill-climb  search  at  th*  next  level.  Th*  plausibility 
of  tb*  final  match  is  thsn  checked  by  reversing  tbs  roles  of  th* 
right  and  left  images  and  repeating  tb*  unconstrained  hierarchi¬ 
cal  search,  starting  with  tha  just-found  matching  point.  In  order 
for  th*  match  to  be  believed,  this  rsvsras  search  must  produce 
a  match  at  (or  immediately  adjacent  to)  the  original  interesting 
point.  Th*  addition  of  this  constraint,  which  effectively  requires 
co-operative  results  between  tb*  two  images,  has  greatly  reduced 
tb*  number  of  bad  matches  in  photographs  which  violate  one 
or  more  of  the  assumptions  around  which  STEREOSYS  was  de¬ 
signed.  Th*  correlation  window  sixa  remains  constant  at  all  levels 
of  the  hierarchy,  so  tb*  match  is  effectively  performed  first  over 
tb*  entire  image,  then  over  increasingly  local  areas  of  ths  image. 
This  technique  perm.t*  -he  us*  of  th*  overall  image  structure  to 
set  th*  context  for  a  match;  the  gradually  increasing  detail  in  th* 
imagery  ia  then  followed  down  through  th*  hierarchy  to  th*  final 
match. 

Figure  3  shows  th*  results  of  this  techniqu*  ex  s  point  in  th* 
Photnix  set.  The  image  hierarchy  ia  th*  same  as  in  Figure  1, 
with  the  addition  of  image  chips  covering  the  metch-d  area  in 
th*  7048  x  2048,  1024  x  1924,  and  517  -  512  images;  there  are 
shown  in  th*  upper  right  corner  of  r *  .  lerarchy  Th*  mulching 
began  in  the  right  image  in  th*  2048  x  2048  chip,  trued  this  _ 
point  through  the  right-hand  hierarchy  (approximately  clockwise  ^ 
in  the  figure)  to  the  16  x  16  right  image,  matched  that  to  th* 

16  x  16  left  image,  then  refined  the  match  back  through  the  left 
image  hierarchy  until  reaching  the  left  2048  x  2048  chip. 

It  is  instructive  to  look  at  the  correlation  coefficients  for  the** 
matches  (see  Tsbie  1)  In  the  smaller  images,  the  correlation  is 
poor,  since  the  window  covers  s  large  area  of  terrain  with  a  great 
deal  of  relief.  As  the  matching  moves  up  the  hierarchy,  the  corre¬ 
lation  improve*,  because  the  window  now  approximates  an  area 


b: 

h>. 


r  . 

■r  i 


■'* 


no 


at  a  single  elevation.  After  retching  the  256  x  256  images,  how¬ 
ever,  the  correlation  begins  to  decline,  both  in  absolute  value 
sukI  with  respect  to  an  autocorrelation-based  threshold  [Hannah, 
1974] .  This  is  due  to  noise  in  the  images,  if  one  examines  the 
chip  from  the  2048  x  2043  left  image,  one  will  see  several  streaks 
across  the  image,  representing  scratches  on  the  original  photo¬ 
graph  and/or  dropped  data  in  the  digitisation;  close  examination 
also  reveals  a  grainy  noise  pattern.  Because  the  degraded  cor¬ 
relations  will  cause  difficulties  in  determining  which  matches  are 
the  correct  ones,  our  processing  has  gone  only  to  the  1024  x  1024 
images,  the  highest  resolution  image  in  which  the  noise  was  con¬ 
sidered  tractable.  Once  processing  is  complete,  STEREOSYS 
can  be  used  to  reiine  the  final  matches  from  this  level  down  to 
the  original  2048  x  2048  images. 

Figure  4  shows  the  results  of  HMATCH  on  the  most  inter¬ 
esting  point  in  each  grid  cell.  Only  the  points  thought  to  ha«e 
been  matched  correctly  are  shown;  those  with  poor  correlation 
or  whose  matches  fell  outside  of  the  image  have  been  discarded 
by  STEREOSYS. 

2.2.2  Relative  Camera  Model  Calibration 

If  no  camera  calibration  information  is  available,  the  module 
C2MODEL  can  be  used  to  calculate  a  simplistic  relative  camera 
model  from  a  set  or  matched  point  pairs.  This  is  accomplished  by 
searching  for  5  angles— the  azimuth  and  elevation  of  the  second 
camera’s  focal  point  with  respect  to  the  first  camera;  and  pan, 
tilt,  and  roll  of  the  second  camera’s  axes  with  respect  to  those 
of  the  first.  The  object  of  the  search  is  to  minimise  the  error 
between  the  matched  point  in  the  second  image  and  the  epipcv 
l&r  line  produced  when  the  point  in  the  first  image  is  projected 
through  the  hypothesis  id  pinhole  cameras.  The  search  proceeds 
by  a  linearization  of  the  eductions  and  their  analytic  derivatives 
IGennery,  1980].  Once  a  solution  is  found,  the  reliability  of  the 
matched  points  is  assessed.  Points  that  appear  to  contribute  too 
much  error  to  the  solution  are  removed  from  the  calculation,  and 
the  solution  is  redone.  Either  this  process  reaches  a  successful 
conclusion  when  the  point  ret  is  found  to  be  consistent,  or  it 
reports  failure  if  too  many  of  the  point  pairs  are  rejected. 

The  resulting  camera  model  is  quite  crude,  as  it  must  depend 
on  a  guess  as  to  the  focal  lengths  of  the  cameras  and  the  length 
of  the  baseline  between  the  cameras.  Also,  it  assumes  that  we 
are  using  pinhole  cameras,  thus  totally  ignoring  the  internal  ge¬ 
ometry  of  real  cameras.  However,  in  many  cases,  it  is  suitable  for 
approximating  the  epipolar  constraint  to  simplify  further  match¬ 
ing 

2.2.3  Epipolar  Constrained  Hierarchical  Matching 

If  the  camera  parameters  are  given  (or  once  the  crude  ones 
have  been  derived),  matching  can  proceed  somewhat  more  em- 
ciently.  The  camera  parameters  define  the  manner  in  which  a 
point  in  the  first  image  projects  to  a  line  in  the  second  image 
the  epipolar  constraint.  This  constraint  can  be  used  to  cut  the 
search  f.om  two  dimensions  (all  over  the  image)  to  one  dimension 
(back  and  forth  along  the  epipolar  line),  as  implemented  in  the 
module  LMATCH. 

LMATCll  proceeds  very  much  like  HMATCH,  except  that 
the  search  lor  a  match  is  confined  to  the  vicinity  of  the  epipolar 
line  Because  we  assume  that  there  is  no  outside  information 
to  indicate  where  these  preliminary  matches  lie  along  the  lir.e, 
we  again  use  the  hierarchical  technique  to  search  out  and  refine 
the  match.  If  relative  camera  parameters  have  been  derived. 


LMATCH  is  used  on  the  second  most  interesting  point  in  each 
grid  cell  and  on  any  already-matched  points  that  C2MODEL 
indicated  vere  unreliable;  the  results  of  this  mode  are  shown  in 
Figure  5.  If  the  true  camera  parameters  have  been  supplied, 
LMATCH  is  used  on  the  two  most  interesting  points  :n  each  grid 
cell;  these  results  are  shown  in  Figure  6. 

2.3  Anchored  Matching 

Once  several  reliable  matches  have  been  found,  they  can  be 
used  as  “anchor”  points  for  further  matching.  Our  basic  tech¬ 
nique  for  this  again  uses  the  grid  cells  in  the  image.  A  given 
point  will  lie  in  some  grid  cell;  the  closest  matched  point(s)  will 
lie  in  that  ceil  or  in  one  of  the  8  neighboring  cells.  Under  the 
assumption  that  the  world  is  generally  continuous,  a  point  would 
b*  -X  pec  ted  to  have  a  disparity  similar  to  that  of  its  neighbors. 
T  x,  the  disparity  at  a  point  is  expected  to  lie  in  the  interval 
of  •  disparities  of  the  well-matched  points  in  the  current  and 
nei  oring  cells.  This  disparity  interval  is  used  along  with  the 
epi;  r  constraint  to  perform  a  very  local  search  for  the  match 
to  a  nt.  Note  that  a  point  is  considered  to  be  well-matched 
if  it  .  a  correlation  above  a  user-settable  absolute  threshold, 
usual  i  5,  and  above  a  variable  threshold,  based  on  the  autocor- 
relatiu.  inction  around  the  point  in  the  first  image  (see  Table  1 
for  exa;  Vs);  in  addition,  a  well-matched  point  cannot  deviate 
more  tli ..  a  specified  distance  from  the  epipolar  line. 

2.3.1  M  ichlng  the  Re..t  of  the  Interesting  Points 

At  this  ;  lint  in  our  processing,  we  have  matched  the  two  most 
interesting  p  unts  in  each  grid  cell.  This  is  still  rather  sparse  in¬ 
formation,  so  we  nest  invoke  the  module  PMATCH  to  match 
the  balance  of  the  interesting  points  It  uses  the  anchored  match 
technique  described  above,  searching  along  a  portion  of  the  epipo¬ 
lar  line,  to  find  these  matches.  Figure  7  shows  the  results  of  this 
module.  Only  point#  found  to  be  well- latched  are  recorded. 

2.3.2  Matching  a  Grid  of  Points 

STEREOSYS  permits  the  user  to  produce  matched  points  on 
a  closely  spaced  grid,  if  desired.  The  module  GMATCH  also  uses 
the  anchored  match  technique,  searching  along  the  epipolar  line, 
to  calculate  matches  on  a  user-specified  grid.  Figure  8  shows  the 
result#  of  this  module  on  a  20  x  20  grid.  The  smaller  marks 
indicate  matches  in  which  STEREOSYS  has  little  confidence, 
these  are  currently  not  recorded  in  the  data  structure,  leaving 
holes  in  the  grid.  This  poin's  up  a  problem  with  grid  matching— 
not  all  areas  of  an  image  have  imormation  suitable  for  matching, 
and  forcing  a  match  at  such  areas  can  lead  to  poor  results. 

2.4  Post  Processing 

Although  not  strictly  stereo  processing,  there  are  follow-up 
processes  which  are  necessary  to  turn  stereo  disparities  into  more 
meaningful  3-dimensional  quantities.  These  processes  include  in¬ 
terpolation  and  terrain  modelling. 

2.4.1  Interpolation 

Often,  it  is  not  feasible  to  apply  correlation  matching  at 
points  on  r.  pro-determined  grid.  E'‘tn  when  grid  matching  is  fear 
Bible,  there  will  be  areas  of  the  imag.s  that  cannot  be  matched, 
due  to  noise  in  the  data,  insufficient  information,  or  changes  such 
a#  moving  vehicles;  this  will  result  in  “holes"  in  the  grid  of  terrain 


I 


data,  which  must  be  filled  in  somehow.  And.  frequently,  a  terrain 
model  is  desired  that  has  its  points  more  closely  spaced  than  that 
provided  by  the  stereo  matching  process.  In  all  of  these  cases, 
interpolation  of  the  matched  data  points  is  necessary  to  provide 
information  at  other  points.  STEREOS YS  incorporates  an  ef¬ 
ficient  interpolation  scheme  [Smith,  1984],  permitting  the  user 
to  construct  elevation  data  grids  from  either  "andomly  spaced 
points  or  a  widely  spaced  grid  of  points. 

2.4.2  Terrain  Modeling 

Given  the  dense  grid  of  matched  points  and  the  camera  cali¬ 
bration,  it  is  possible  to  derive  a  digital  terrain  model.  If  absolute 
external  camera  information  and  internal  camera  calibration  is 
available,  the  module  STERDTM  can  be  used  to  create  a  reason¬ 
ably  accurate  DTM,  which  can  then  be  displayed  with  another 
program,  DTMICP.  (An  example  of  DTMICP  output  is  shown 
in  Figure  9;  it  can  also  produce  range  images  of  the  terrain  or 
pictures  of  -he  original  imagery  'painted*  on  the  terrain.)  If  the 
only  camera  information  is  C2M0DEL's  relative  model,  then  the 
module  RELDEPTH  can  be  used  to  create  a  relative  DTM.  How¬ 
ever,  due  to  the  many  over-simplifications  and  the  computational 
instability  of  the  relative  camera  model,  euch  relative  DTMs  an 
of  very  low  accuracy,  and  their  use  is  discouraged. 

3  Evaluation 

Evaluation  of  the  accuracy  of  STEREOSYS  it  difficult,  as 
there  <lo  not  stem  to  exist  stereo  data  sets  with  known  ground 
truth  against  which  to  com  pan  our  results.  We  do,  however, 
have  the  results  of  an  interactive  stereo  compilation  algorithm 
called  Digital  Interactive  Mapping  Program  (DIMP),  produced 
and  operated  by  the  U.S.  Army  Engineer  Topographic  labors, 
lories  (ETL)  |Norvelle,  198 1] .  It  should  be  noted,  bowevir,  that 
ETL’s  results  were  obtained  by  an  interactively  coached  process, 
which  was  run  on  a  5  x  S  grid  in  the  2048  x  2048  images  of  the 
Phoenix  data  set,  and  winch  used  correlation  windows  warpnd  to 
account  for  the  local  steepness  o.  the  terrain,  while  ours  were  ob¬ 
tained  by  a  fully  automatic  process  that  ran  on  randomly  spaced 
interesting  points  in  the  1024  X  1024  images  without  warping 
Comparing  them  is  a  little  like  comparing  apples  end  oranges, 
but  we  did  so  in  the  following  manner. 

Comparison*  were  made  only  for  those  points  for  which 
STEREOSYS  recorded  an  anstver.  Point*  were  eaid  to  heve  the 
same  answer  if  the  STEREOSYS  reeult  and  the  result  at  the 
closest  DIMP  grid  point  (scaled  into  the  1024  x  1024  image  ir. 
which  STEREOSYS  produced  it*  lesults)  were  within  on*  pinel 
of  having  the  same  disparity.  Points  about  which  there  was  di*- 
agreement  were  examined  manually.  An  analyst  looked  at  both 
results,  overlaid  on  the  images  at  a  variety  of  resolutions,  both 
monocularly  and  using  a  stereoscopic  viewer,  then  decided  which 
algorithm  appeared  to  be  in  error  and,  based  on  experience  with 
correlation  algorithms,  attempted  to  determine  why  the  mistake 
had  been  made. 

On  the  phoenix  data  set,  STEREOSYS  found  5545  "inter¬ 
esting  points,"  of  wKch  it  thought  it  could  rehsoly  match  4676. 
Of  these,  only  43  disagreed  significantly  with  '.he  DIMP  results 
for  rearby  points.  Closer  elimination  showed  15  of  these  to  be 
unr.orrected  D!MP  errors,  15  were  STEREOSYS  errors,  5  were 
points  on  which  both  systems  appear  to  ha'e  made  errors,  and  8 
were  points  for  which  the  analyst  could  not  determine  which  sys¬ 
tem  w&a  in  error.  In  most  of  the  cases,  the  DIMP  errors  seemed  to 


result  from,  its  algorithm  having  drifted  gradually  off  track  (usu¬ 
ally  starting  in  an  area  with  littla  information),  and  ita  operator 
not  catching  it  aoon  enough.  Ths  STEREOSYS  approach  of  first 
providing  a  context  in  which  to  work,  so  that  the  code  interpo¬ 
lates  disparities,  instead  of  extrapolating  them,  should  remedy 
this  problem.  Moet  of  the  STEREOSYS  errors  (and  almost  all 
of  the  points  for  which  the  analyst  could  not  determine  which 
algorithm  was  at  fault)  appeared  to  have  resulted  from  an  in¬ 
appropriate  threshold  on  the  intereat  value:  STEREOSYS  was 
trying  to  match  areas  in  which  there  was  not  enough  informa¬ 
tion  to  make  reliable  matches.  Some  of  tha  STEREOSYS  errors 
were  due  to  not  using  warped  correlation  windows  to  account  for 
the  slopes.  In  these  cases,  moat  of  tha  information  in  a  window 
would  be  in  o  corner  of  toe  window,  to  the  disparity  that  was 
calculated  was  that  of  the  corner,  not  the  center  of  the  window; 
using  warped  correlation  or  exponentially  weighted  interest  op¬ 
erator*  and  correlation  windows  [Quarn,  1984]  would  tolva  this 
problem.  A  fair  number  of  the  mistakes  (particularly  tha  ones 
in  which  both  ty items  arrived  at  different  wrong  answers)  were 
because  of  artifacts  in  the  data — film  grain,  scratches,  lint,  hairs, 
fiducial  marks,  end  tha  like;  we  are  a  long  way  from  being  able 
to  understand,  let  alone  automate,  the  human  ability  to  identify 
offending  objects  and  then  ignore  them  .n  processing  stereo  data. 

STEREOSYS  has  also  been  used  on  several  other  data  sets 
in  our  'challengt  data  baas',  described  in  Hannah  [l985-b|.  For 
data  seta  with  no  DIMP  results,  a  much  wnaller  number  of 
points  were  matched.  These  were  then  compared  -with  >h«  human 
viewer's  perception  of  what  were  the  correct  matches.  Only  the 
more  blatant  mistakes  were  detected  and  furthar  analysed;  the 
resulte  of  which  are  presented  in  Hannah  [1985-aj. 

4  Discussion 

Our  objectiv*  in  constructing  STEREOSYS  ws*  to  implement 
s  stste-of-the-srt,  area- based  system  for  stereo  compilation  oper¬ 
ating  on  aerial  photography.  Along  the  way,  we  hoped  to  remedy 
tome  o!  the  obvious  problems  w*  had  teen  with  existing  systems, 
such  aa  DIMP't  tendency  to  extrapolate  itself  off  track.  In  this 
w*  have  succeeded 

Because  STEREOSYS  usee  fairly  independent  judgment  on 
each  match,  it  tends  to  avoid  the  problems  we  have  teen  in  the 
DIMP  resu!'*;  indeed,  on  the  Phoenix  data  act,  STEREOSYS 
was  sbl*  to  duplicate  DIMP’s  correct  results  (for  the  points  tried) 
and  rectify  a  number  of  DI.MP'a  mistakes.  Although  it  happens 
rarely,  it  is  still  poaaibla  for  STEREOSYS  to  make  mistakes  in 
the  early  stages  of  ita  processing,  then  propagate  these  mistakee 
into  later  matches.  To  avoid  this,  mors  work  nredt  to  be  done 
on  algorithms  for  detecting  improperly  matched  points,  so  they 
can  be  removed  before  further  processing. 

The  major  criticism  we  have  heard  of  STEREOSYS  is  thst 
it  produces  matches  st  randomly  spaced  points  (only  where  ad¬ 
equate  information  is  present),  when  what  is  usually  wanted  is 
a  closely  spaced  regular  grid  of  ehva'.ion  points,  regardless  of 
image  content.  So  fur,  attempts  at  blindly  interpolating  the  dis- 
pari'y  data  (ignoring  the  image  data)  a*  reported  in  Smith  1 984] 
have  proven  less  than  satisfying.  Marriage  of  the  STEREOSYS 
techniques  with  something  like  DIMP,  or  with  hierarchical  warp 
correlation  [Quam,  1981],  or  with  image  intensity-based  interpo¬ 
lation  'Smitn,  198 5  or  Baker,  1982]  might  be  profitable 

We  have  performed  one  experiment  as  a  preliminary  study 
in  how  to  integrate  the  strengths  of  STEREOSYS  with  those 
of  an  edge-based  matcher  The  result!  of  STEREOSYS  were 


I 


■  c 


152 


used  as  seeds  for  an  edge-baaed  matching  system  [Baker,  1932’, 
which  propagated  these  matches  along  the  nearby  xero-croasing 
contours,  then  did  one  iteration  of  edge  matching.  Because 
determining  disparity  constraints  is  a  large  part  of  the  edge- 
based  matcher’s  processing,  introducing  this  information  from 
STEREOSYS’s  results  produced  a  significant  reduction  in  com¬ 
putation  time  used  by  the  edge-based  matcher.  The  number  of 
matched  points  also  increased  by  about  an  order  of  magnitude 
over  the  results  of  STEREOSYS  alone.  Although  we  have  not 
yet  finished  a  quantitative  evaluation  of  thece  match  accuracies,  a 
qualitative  analysis  indicates  that  the  results  from  the  combined 
technique  are  significantly  more  accurate  than  the  results  of  the 
edge-based  system  alone. 

Overall,  we  have  found  that  STEREOSYS  performs  credibly 
on  the  low-resolution  aerial  imagery  for  which  it  was  designed. 
It  has  difficulties  when  processing  «r:as  that  violate  its  premies 
about  the  continuity  of  the  world,  but  Unking  it  with  an  edgs- 
baeed  matcher  (which  would  excel  in  these  types  of  aress)  seems 
to  be  a  promising  approsch. 

Acknowledgements 

The  research  reported  herein  was  supported  by  the  Defense 
Advanced  Research  Projects  Agency  under  Contract  MDA903- 
83-C-002T,  which  is  monitored  by  the  U  S.  Army  Engineer  To¬ 
pographic  Laboratory.  The  views  a-  d  conclusions  contained  in 
this  paper  are  thoee  of  the  author  and  should  not  be  interpreted 
as  necessarily  representing  the  official  policies,  either  expressed 
or  implied,  of  the  Defense  Advanced  Research  Projects  Agency 
or  of  the  United  States  Government. 

I  would  like  to  thank  lfarlyn  Baker,  Robert  Bollea,  Martin 
Fischler,  Lynn  Quam,  and  Grahams  Smith  for  their  support  oo 
this  project. 


Moravec,  Hans  P-,  1980.  ‘Obstacle  Avoidance  and  Naviga¬ 
tion  in  the  Real  World  by  a  Seeing  Robot  Rover,"  Ph  D.  The¬ 
sis,  Stanford  University  Computer  Science  Department  Report 
STAN-CS-SO-S 13,  September,  1980. 

Nianihara,  H.  Keith,  and  Tomaso  Po&gio,  1983.  "Stereo 
Vision  for  Robotics,"  Proceeding*  of  t-ie  International  Symposium 
of  Robotic*  Research,  Bretton  Woods,  Nil,  September,  1983. 

NorveUe,  F.  Raye,  1981.  "Interactive  Digital  Correlation 
Techniques  for  Automatic  Compilation  of  E'evatmn  Data,"  li  S 
Army  Engineer  Topographic  Laboratories  Report  ETL-0272,  Oc¬ 
tober,  1981. 

Panton,  Dale  J..  1978.  ‘A  Flexible  Approach  to  Digits!  Stereo 
Mapping,"  photogrammetric  Engineering  sad  Remote  Sensing, 
Voi  «4,  No  12,  pp.  1499-1512. 

Quam,  Lynn  H.,  1971.  "Computer  Comparison  of  Picture*, * 
Ph  D  Thes-s,  Stanford  University  Computer  Science  Department 
Report  STAN-CS-71-219,  May,  1971. 

Quam,  Lynn  H.,  1984.  "Hierarchical  Warp  Stereo."  Proceed¬ 
ing*  Image  Vnderttaniing  Workshop,  New  Orleans.  LA,  Octo¬ 
ber.  1984,  pp  149-1S6 

Smith,  Crshame  II.,  1984.  "A  Fast  Surfsce  Interpolation 
Technique,"  Proceeding*:  Image  i'nderitanding  Workshop,  New 
Orleans.  LA,  October,  1984,  pp  211-215 

Smith,  Grahams  □  1985.  "Stereo  Reconstruction  of  Scens 

Depth,"  Proceeding*  of  the  IEEE  Conference  on  Computer  V  ision 
sad  Pattern  Recognition,  San  Francisco,  CA,  June  9-13,  1985. 


Reference* 

Baker,  H.  Harlyn,  1982.  “Depth  from  Edge  and  Intensity 
Based  Stereo,"  Ph  D  Thesis,  Stanford  University  Computer  Sci¬ 
ence  Department  Report  STAN-CG-82-93C,  Sept-mber  198? 

Eurt,  Peter  J.,  1981.  "Fast  Filter  Transforms  for  Image  Pro¬ 
cessing,"  Computer  Graphic*  and  Imagt  Processing,  Vol,  16,  pp. 
20-ol,  1981. 

Gornery,  Donald  D-,  1980.  "Modelling  the  Environment  of 
an  Exploring  Vehicle  by  means  of  Ste-eo  Vision,"  Ph  D.  The¬ 
sis,  Stanford  University  Computer  Science  Department  Report 
>l'AN-C3-80-805,  June,  1980. 

Hanncii,  Martha  Jo,  1974.  "Computer  Matching  of  Areas 
in  Stereo  Images,"  Ph.D.  Thesis,  Stanford  University  Computer 
Science  Department  Report  STAN-CS-74-438,  July,  1974 

Hannah,  Marsha  Jo,  1980.  "Bootstrap  Stereo,"  Proceeding*: 
Image  Understanding  Workshop,  College  Park,  MD,  April,  1980, 

pp.  201-208. 

Hannah,  Marsha  Jo,  1985-a.  "Evaluation  of  STEREOSYS 
vs  Other  Stereo  Systems",  SRI  International  Artificial  Intelli¬ 
gence  Center  Technical  Note  3G5,  October,  1985 

Hannah,  Marsha  Jo,  1985.  “The  Stereo  Challenge  Data 
Base",  SR!  International  Artificial  Intelligence  Center  Technirrl 
Note  206.  October,  1985. 


Figure  1  Reduction  Image  Hierarchy 


1 


Figure  2  Interesting  PomU,  Ranked  by  Grid  Cell. 


Figure  3-Hierar Jiical  Match  of  an  Interesting  Point. 


Table  1 'Hierarchical  Correlations  for  Point  m  Figure  3. 


Iltgt  nixo 

Point  1 

Point  3 

Correlation 

intacorrolntlon 

16x18 

(S.  11) 

(0.  11) 

0  140463 

0  677117 

33x33 

(10.  33) 

(10.  33) 

0  184883 

0.437053 

84x84 

(38.  48) 

(37.  48) 

C. 738681 

0738417 

138x138 

(77.  02) 

(78.  03) 

0.030033 

0  886280 

258x388 

(164.  184) 

(163.  184) 

0  364608 

0  018228 

613x613 

(3C8.  380) 

(308.  380) 

0  018083 

0  030438 

1024x1024 

(818.  738) 

(813.  137) 

0.760448 

0  032047 

3048x3048 

(1333.  1478) 

(1333.  1476) 

0  341823 

0.700017 

lr>4 


Figure  i  Results  of  Cnstructured  Hierarchical  Matching  of  Mott 
Interesting  Point  in  Each  Grid  l>il 


Figure  5-fteeulta  of  Epipolar  hierarchical  Matching  of  Second 
Moat  Interesting  Point  in  Each  Grid  Cell. 


Figure  8-Rrsults  o(  Anchored  Epipolar  Matching  of  a  Grid  of 
Points. 


Figure  &~A  View  of  the  Resulting  Digital  Terrain  Model. 


GC97<yV5-ZI 


CONCURRENT  MULTILEVEL  RELAXATION 


Domstri  Terxopoulos 


Mi  l'  -\rlifiiial  Intelligence  Lal>. 
515  Technology  Square 
Cambridge,  MA  02139 


A  bat-act 

Mulln/nJ  relaxation  techniques  lead  to  highly  efficient 
iff  mine  algorithms  that  are  veil  suited  to  the  optimisa¬ 
tion  solution  i)/  problems  arising  in  low  In rl  image  «in- 
drrs 'uniting,  standard  schemes  Jot  coordinating  mob 
tilrvrl  relaxation  processes  are  designed  /or  sequential 
machines  and  '  *t*e  alt  but  a  single  lei  cl  idle  at  any 
time  This  jmp, .  -.tops  a  concurrent  niuitigrid  coordi¬ 
nation  strategy  which  mainlmna  simu/luneouJ  activity  in 
oil  let. els.  Cunseiparntly,  concurrent  multignd  relaxation 
rnn  Jully  exploit  the  processing  jiouier  offered  by  the  new 
generation  oj  massively  pariillel,  fine-grained  hardware 
-with  local  inicryroccssur  connec'ions. 


1.  Introduction 

It  lias  become  increasingly  evident  in  recent  years  that 
explicit  recognition  of  the  presence  of  multiple  scales 
of  irsolution  leads  to  approximation  methods  •'.ml  as¬ 
sociated  algorithms  of  (treat  power  and  efficiency.  As  a 
consequence,  machine  vision  researchers  me  intensively 
developing  various  approaches  to  imillirrxcdulion  image 
prnei  ssinit  and  analysis  lln«cnfeld,  11)84..  \i]iiraii  these 
h  an  approach  that  adapts  ideas  central  to  a  class  cf 
iterative  numerical  techniques  known  as  nmlligrid  r®* 
luxation  methods  I  errapoulor,  I0H.I  .  I.fluicnt  algo- 
rn  I, III'  based  on  these  methods  have  been  developed  for 
a  oin.ibcf  il  coinpulation.il  problems  that  arise  in  the 
curb  Pages  of  image  under. dan. Lng  I erropoulos,  1983, 
I  Up  ha;  Cla/cr,  11)8-1; 

Muhigrid  relaxation  Iccnniqnes  were  orii;mally  lar¬ 
ge, rd  to  the  elliriint  numerical  solution  ol  elliptic  par¬ 
tial  - 1  tlferenl  ial  c*|iiui  i-uis  ILuhl  tisrh  and  I  i.iltenhi  rg. 


1982' .  [equations  of  this  kind  express  necessary  condi¬ 
tions  fot  the  stationary  points  of  variational  principles. 
Mated  as  the  optimisation  of  certain  objective  function¬ 
als  (i  c  ,  energy  functionals  f(n)),  the  latter,  il  turns 
out,  arise  naturally  in  early  vision  from  the  rcgiilariia- 
tion  ol  a  wide  range  of  macliematir.dly  ill-pnsrd  inverse 
visual  prulilrms  (refer,  e  g.,  to  the  MIT  progress  report 
liy  Puggin  et  of.  in  lh>  proeecilings  and  to  [Trriopou- 
los,  I'J.Mib  )  because  of  this  relationship,  the  utility  of 
multigrid  methodology  in  early  vision  is  potentially  very 
broad. 

Another  attraction  of  mulligrid  methods  is  that 
they  can  he  designed  to  require  only  lueal,  parallel 
compulations.  Hence,  they  are  implciiienlable  on  the 
locally  interconnected,  parallel  computer  architectures 
now  made  possible  hy  VLSI  technology;  in  particu¬ 
lar,  the  massively  parallel  processors  that  have  been 
conceived  for  image  processing  am!  Ai  applications 
[llalcher,  1 9c(0-  Iiillis,  1085). 

In  tins  paper,  we  develop  a  concurrent  multigriii 
iclaxalion  scheme  for  eaily  visual  processing  which,  un¬ 
like  standard  mulligrid  algorithms,  is  designed  to  fully 
ui  due  t be  processing  power  olTi-red  by  this  new  genera¬ 
tion  of  distributed  computers. 

l.J.  Conventional  Multigrid  Processes 

From  a  Fourier  analysis  point  of  view,  r  relaxation 
is  a  local  operation,  it  ran  compute  edkieiitly  only  a 
coniinrii  range  of  spatial  rnmponents  of  the  desired  dis¬ 
crete  solution  Tlte  useful  eificieiiry  range  is  determined 
I.)  the  resolution  of  the  gild.  More  precisely,  the  short¬ 
est  wavelnigiii  components  ol  the  .qiproxima- ion  error 
function  (t  liosi-  mi  the  ordi  r  of  the  inlernodc  sp  icl'jg) 
are  aiinilolated  (piirkly,  whereas  the  longer  wavelength 
i  oiopnnciit-  p.v-isi  ovi-r  many  iterations  .ttiamlt,  I r )77i . 


Multigrid  methods  significantly  extend  »he  effi- 
ciency  range  of  relaxation  through  the  use  of  exponen¬ 
tially  tapered  muliiresolution  grid  hierarchies.  In  prac¬ 
tice.  one  uses  a  set  of  regular,  discrete  grids  with  suc¬ 
cessive  doubling  of  internode  separations  from  one  level 
to  the  next  coarser  level  (i.c.,  successive  halving  of  res¬ 
olution).  A  portion  of  the  standard  grid  hierarchy  is 
illustrated  in  Fig.  1.  In  principle,  it  can  be  mapped 
onto  regularly  interconnected  VLSI  architectures,  and 
in  a  fully  parallel  muliigrid  implementation,  each  node 
would  represent  an  individual  processing  element. 


Figure  1.  Typical  muliigrid  organisation.  A  portion  of 
three  levels  of  the  2  1  nu.Jligrid  hierarchy  is  illustrated.  Only 
nearest  luiglibut  inter  processor  connections  are  shown. 

lielaxaiion  in  either  Jacobi  (parallel)  or  Gauss - 
Seidel  (  erial)  form  is  the  basic  iterative  solution  method 
"Mti pioved  on  each  level.  In  conjunction  with  intralevel 
relaxation  processes,  a  scries  of  coarse- to- fine  (exten¬ 
sion)  and  line-toconrsc  (restriction)  processes  permit 
liansfer  of  information  between  adjacent  levels.  These 
subprocesses  are  coordinated  to  compute  the  multigrjd 
solution. 

1.2.  Recursive  vs  Concurrent  Multigrid 
Coordination 

Conventional  muliigrid  schemes  employ  recursive  coor¬ 
dination  strategies  Brandt,  1977  b  The  solution  of  adis 
Crete  problem  on  the  finest  grid  requires  the  solu'ion  of 
a  -rqiionce  of  related  problems  on  coarser  levels  which, 
in  turn,  require  the  solution  of  related  problems  on  still 
rnai'.scr  levels.  'Hie  recursion  is  carried  to  a  depth  where 
the  grids  are  >ullicieut!y  coarse  that  the  expense  in  com¬ 
puting  solutions  becomes  trivial,  further,  the  recursive 
nuiUigi  al  M'hcme  c.ui  lie  applied  starting  from  the  coars¬ 


est.  level  and  proceeding  successively  to  the  finest  level, 
u.'ing  the  results  at  ».  .y  level  .is  an  initial  approximation 
for  the  next  level  (this  is  known  as  nested  relaxation; 
sec,  e  g.,  Tcrzopoulc- ,  lOSGa  for  the  algorithm). 

The  recursive  multigriii  coordination  strategies 
were  targeted  at  serial  computers.  Although  effective 
on  uniprocessors,  they  arc  not  designed  to  make  opti¬ 
mal  use  of  all  available  processing  elements  in  a  mas¬ 
sively  parallel  machine,  even  when  a  spatially  parallel 
Jacobi  relaxation  is  employed.  This  is  because  most  of 
the  time  is  spent  performing  relaxations  on  only  a  sin¬ 
gle  level,  whde  processors  on  the  other  level.*  remain 
idle,  with  the  rest  of  ihe  time  spent  transferring  results 
between  pairs  of  adjacent  levels.  Very  p./or  utilization 
of  processors  results  on  average,  since  a  significant  per¬ 
centage  of  the  iterations  and  transfers  are  performed  on 
the  coarser  levels. 

This  deficiency  has  prompted  proposals  for  increas¬ 
ing  the  level  of  parallelism.  One  approach  is  via  clever 
multilevel  dec ompi.si* ions  of  the  problem,  such  that  a 
concurrent  coordination  of  nmltigrid  proo  f  s  does  r.ol 
become  self  defeating.  Brandt  1981  pursued  some 
early  ideas  along  these  hues  m  the  context  of  vector 
supercomputers,  vvlurli  offer  a  rather  low  level  of  paral¬ 
lelism,  but  the  convergence  rails  that  he  reported  were 
not  impressive.  Hannon  and  KoMti.lnh*  198.’  proposed 
multilevel  deroj»n>o*-i1  inn*  that  render  the  muliigrid  so¬ 
lution  better  suited  to  massively  parallel  computers.  In 
the  context  of  visual  surface  re*  onstruction,  lerzopoii- 
Ins  1981!  suggested  a  multilevel  decomposition  similar 
to  Camion's,  but  the  idcr  remain*  to  bt  implemented. 
Kno  '198-V  extended  Hannon’s  a»  »r«-  ah  to  make  inor 
system  it ir  use  of  digital  filtering  iinory. 

While  fully  concurrent  nmltigrid  coordination 
across  ail  levels  is  clearly  desirable  in  general,  the  need 
to  solve  piecewise  continuous  reconstruction  problems  in 
early  vi.ion  Terzopoulos,  IOHIjIi  raises  particular  com¬ 
putational  issiu.4  which  must  be  resolved  when  design¬ 
ing  multigriii  algorithms  for  vision.  In  particular,  there 
arises  a  need  to  process  irregularly  structured,  sparse 
data  input  at  multiple  resolution:*.  The  data  generally 
includes  both  constraints  and  ili:  continuities. 

The  ancurrent  nmltigrid  algoi  it  Inns  proposed  in 
the  bterature  are  not  designed  to  handle  visual  data. 
As  argued  by  Kuo,  proper  applhation  of  tie*  decompo¬ 
sition  ap|  roach  requires  the  use  of  i datively  large  sup¬ 
port  filters.  I’nfortunat e1  y,  such  Idlers  require  continu¬ 
ous  regions  and  geometrically  tegular  boundary  condi¬ 
tions.  This  virtually  prohibits  their  application  to  visit;  l 
problems  involving  irregularly  occurring  constraints  and 
discontinuities. 


157 


?■ 

>  ■ 


«r 


r 


The  present  paper  takes  a  variational  approach  to 
concurrent  multigrid  relaxation  which  promises,  in  prin¬ 
ciple,  to  resolve  these  dilfirultics.  The  formulation  is 
based  on  a  multilevel  objective  fg..n  tional  which  is  de¬ 
veloped  in  Section  2.  Tins  leads,  in  Section  3,  to  a 
.eiax.ttion  subprocess  for  cn-h  level,  which  includes  not 
only  neighboring  nodes  on  that  love  .  but  couples  nearby 
nodes  on  the  adjacent  coarser  and  liner  levels  as  well. 
The  subprocessos  arc  driven  concurrently  across  the 
multiple  levels.  Section  4  proposes  a  dynamic  strategy 
for  adjusting  the  coupling  strengths  between  relaxation 
processes  during  the  iterative  process  in  order  to  achieve 
a  common  multilevel  objective. 


2.  Concurrent  Multilevel  Objective 

The  objective  of  multilevel  relaxrcion  is  to  efficiently 
compute  discrete,  l.mltiresolution  aporoximations  to  the 
solutions  of  given  continuoir  problems.  Formally  speak¬ 
ing,  discrete  approximations  characterize  optima  of  a 
multilevel  objective  func'io  tab  The  approximations  are 
constructed  in  nested  colltcliotts  of  finite  dimensional 
approximating  spaces. 

2.1.  Nested  Finite  Element  Spaces 

Suppose,  in  particular,  that  the  Gr.itc  dimensional  ap¬ 
proximating  spaces  are  a  family  of  L  finite  element 
-•paces  j ,  n.  stcri  si. eh  that  5*  t  S  It  is  con¬ 

venient  for  the  size  of  (he  e'emeilts  of  Sl  to  be  given 
by  h'  -  2'--'ht,  relative  to  the  fundamental  size  h1'  of 
elements  hi  the  finest  suacc  Ur  will  refer  to  5 
and  5'*  *  as  being  adjacent  to  S'  .note  that  5'  *'md  S 
have  only  one  adjacent  space). 

The  bans  functions  of  finite  element  spaces  have 
local  support;  lh.it  is,  hey  involve  only  spatially  prox¬ 
imal  nodal  variable.  'Alien  the  nodal  variables  form  a 
,;ri,|,  li  i  -  co  ii  Vi  u  tent  o  write  the  basis  func  I  ions  of  S' 

a,  ,;,j  where  the  .subscripts  i, ;  identify  a  n  -de  of  the 
grid.’  One  rail  then  express  a  generic  function  o'  (  S' 

i‘d 


\f  A' 


«'  V 

•:x,. 

»  ;  i 

i 

:r 

denotes  a 

nodal  variables 

associated 

The 

family  of 

su»  a  fum  t  ions  i 

s  denoted 

.  S'  •  •  ■ 

e  SL. 

The  Nlultilevrl  Objective  Functional 


Given  the  above  uCuin' it/riii,  loc  (i>ti>tii«.vi  <  ui<jv 
funct  'in. ’ll  E(v'  :  $l  x  x  SL  — >  may  be  denned 


f(v)  = 


r  >* 1  'nb  i 


(v‘) 


LrL(uL)J 


,C(v). 


The  first  term  is  derived  from  the  energy  functional 
£(•■'}  for  the  problem  at  hand,  whose  discrete  approx¬ 
imation  in  the  finite  clement  space  S'  i'  denoted  by 
f'(v')  :  S' -sit.  See  Tcrzopoulos,  1383,  1085,  I08fiaj 
for  detailed  derivations  of  discrete  fnncliaita..  i  finite 
c  ement  spaces  for  a  number  of  vision  probh  i..,.  The 
vector  functional  C(v)  serves  to  couple  ;hc  approxima¬ 
tion  computed  in  each  space  S'  to  its  adjacent  spaces. 

2.3.  The  Multilevel  Coupling  Functional 


The  multilevel  coupling  functional  can  be  written  as 


C(v) 


Hl  u1  II 1  jo2) j3 

Iju'  -  i'y-,;;iJ  +  M,!lv*-iiVM!i!J 

Kl-  v‘‘  l'-V-  ']  ^ 


where  it  is  convenient  for  to  be  the  Kuclidian  norm. 
The  constituent  functionals  involvin',’  k  factors  impose 
a  OMisO'to-liiic  rntiplim'  between  adjacent  space.*,  while 
those  iuvolvini;  /x  factor*  impose  the  i -  nver^o  fine-to- 
roar^c  coupling.  Ail  coupling  factors  arc  real,  non- 
nr*»alivr  number*,  and  their  liinjput'idrs  deter mind  the 
siren  ;*  h  of  the  interlevrl  coMplm{;s.  The  tuuplin;'  ft.  :• 
tional  also  involves  two  sets  of  mappings  [denoted  I  anti 

II)  to  each  S1  from  ’s  a  liacrii*.  finite  clemen*  spares. 

o 


2.4.  fntcrlevel  Mappings 


The  i*ilerlrve|  mappings  perform  tlie  irquire  i  interlevel 
f  liangei  of  hoses.  The  coar  e-to-ime  ha  is  is  ac- 

(omph'-hed  hy  ap  injrr.tiou  mapping 


l 


while  *  in-  rort-  er°<  line  lo  t  o.ir^e  i  h.tii.'e  i-;  act  on.,  ii  died 
by  a.  ojer lion  nuippi;:;! 

II'  :  S'  -  ’  ■  •  l  1,  .  I,  i. 

!n-  grid.,  '.he  Mitch  vri  mapping,-  I . ;  i  lh<‘  :g  Mini 


9’ 


1  3  3 


.In 


!  >i  u.t  l  ion 


which  corresponds  to  matrix  multiplications  on  the 
nodal  variables.  In  practice.  It  '-al  support  mappings 
are  employed,  which  means  that  he  coefficients  tj  ^lf>|  n 
and  ?rj  *  *m  n  arc  nonzero  only  within  some  neighborhood 
of  node  t,j.  'Diis  corresponds  to  mult iplir.it ions  with 
sparse  matrices.  Indeed,  it  is  most  natnr.d  (though  not 
absolutely  nece.s.-aryj  to  specify  the  rocJlir ictus  accord¬ 
ing  to  the  local  interpolant  of  the  fintie  element  space 
in  which  the  ugument  function  resides. 


3.  Concurrent  Multilevel  Relaxation 


As  a  natural  extension  to  numerically  solving  single¬ 
level  variational  principles  Tcrzopoulos,  ,  the  nec¬ 
essary  condition  for  optimizing  the  multilevel  objective 
functional  —  i.e.,  characterizing  u  infvE(v)  accord¬ 
ing  to  the  vanishing  of  the  gradient  VK(v)  —  results 
in  a  multilevel  system  of  algebraic  equations.  Provided 
that  the  given  energy  functional  SI1*)  derives  from  a 
well-posed  variational  principle,  the  necessary  condition 
wiH  also  be  sufficient,  because  the  added  multilevel  cou¬ 
pling  functional  is  quadratic  and  posit:. •  Mute  due  to 
the  presence  of  norms.  The  algebraic  <  r n  will  then 

have  a  unique  solution,  which  the  mul.iirvel  relaxation 
process  approximates  iteratively. 

Multilevel  relaxation  may  be  viewed  as  a  vector  in- 
tralevel  mapping 

R  :  S'  x  *  SL  ■-  S'  •  •  x  SL. 

Its  farm  at  a  generic  node  i.  '  »s  determimd  by  setting 
to  zero  the  par'ial  derivative  f  the  multilevel  objective 
fuiutioual  wnh  re^prei  to  ;  ■  for  l  I  .  ,  (..  The 

following  general  form  results: 

R(v).„ 

r'‘  I''1!.  ,  ■  i‘'  II' 

r'(r'),  ,  •  ,  I'  !  ;  li  1  ;  . 


eraloi  « 

.  r  r 


a  pen  t«n  .i  MmlarU  luil  m.r.i 

an1  fake  :  h «“  gen*’ f  al  b  rio 
\f  \ 


T't  *1  1 


a  here  t  he  rudlii i«-*it*  ftf  it  are  nonzero  on iy  fur  nodes 
m.n  whuh  .ire  pmxuual  to  runic’  t,j. 


The  iuforni.il ion  IImw  within  the  h;riar«hy  implied 
by  tlie  C'  Ti.ni  -vnl  uiuddevc!  nlax.it ion  scheme  is  illus- 
i  rated  in  l  ig.  2 


Figure  2.  Ijiformuti'Mi  fiuw  within  the  inultignd  hierarchy. 


■1.  Dynninic  Coupling 

'Du  re  remains  tlie  i-aie  of  making  appropriate  choicrs 
for  ?  i.t  (i  iqding  strengths  «*  and  /d  so  as  to  obtain  use- 
fa  1  mud ih  \< !  -oh* l  ions.  A.y  a  general  rub  ,  the  rilaxal  ion 
-rs  «. II  the  roarer  '-Cah  s  suffer  trom  increasingly 
I.; '  \t  di  -eret i/at  mu  «  rr*u  - ,  hut  converge  to  the  '■o.irse  -o- 
bitton  >  rial  i  vi  I  y  qunkly-  Donvtr  «ly,  iho-*-  on  the  limr 
i  ih-  are  ifu  re.i-sn  *iy  accurate,  hut  exhibit  a  subst.ui- 
‘i.t’b,  -j./-,vir  reqjnie-e. 

f )  1 1  'i.e  i  Mi  r  hafid,  bv  -ft  !:*>/  the  k*  to  «.omr  inter 


t  •  ».if •i«*r  ones  liv  way  <> i  mle rmrdiale  procrv*fs,  so  that 
• !. i - 1  rrvMinjf  r har.irtcrist its  of  the  c».\r'ff  relax- 
l  ,< n  pr“*  -s  iv  (o  x>inr  decree,  extended  to  the  entire 
r  ;r:d  :n«-rar'.*y  i  he  coarse-  to- fine  coupling  can  be 
t.  •*.'»  i.n!  to  hi  i^htrn  the  elTect,  hut  beyond  a  certain 
}-  »;f  i»4. .,r  .K.iir.ry  of  the  coarser  levels  corrupts 

t  ic-  '••iuioiii  Computed  in  the  finest  levels 

On  »j*f  other  hand  by  *ettuig  the  p‘  to  some  inter- 
M'd- dr  c  n.irnT  relaxation  processes  a re  coupled 

t  •  thi  liner  ones  m>  that  the  higher  accuracy  of  the  6ner 
approximation*  permeates  t he  whole  ruiltignd  hierar¬ 
chy  However.  a-  the  hue -to- coarse  coupling  i>  tight¬ 
ened  .  the  muitigrid  hierarchy  lends  to  be  infected  by 
the  'low  re*p«n»*r  rh.Liact  eristics  of  ibe  Oner  relaxation 
processes. 

Dui.iiiik  coupling  rr*olve*  the  dilemma  By  appro- 
pi  lately  .id ,  11  i ii 4  the  »•  •  ■.,  ling  strength*  during  the  it 
<  r  iior  process,  the  muitigrid  hierarchy  simultaneously 
>n her it-s  a  f.i.st  response  and  a  consistently  high  accu- 
r.u  y  A  general  strategy  for  adjusting  the  coupling  fac¬ 
tor*  is  the  following  Initially  there  is  a  strong  coaf  'e-lo- 
!inr  interaction  This  accelerates  the  converg  no  rate 
of  tin-  finer  relaxation  processes  during  the  early  itera¬ 
tions.  when  the  approximation  is  rather  poor.  This  in¬ 
ti  ration  deiays  .us  the  i pproximat ion  improves,  and  is 
replaced  by  a  strengthening  line-to-coarse  interaction, 
which  eventually  enables  the  accurate  approximations 
computed  on  the  finer  levels  to  dominate  and  improve 
the  accuracy  of  the  coarser  approximations.  In  partic¬ 
ular,  one  can  choose 

.'(I)  «!,«';  At)  Mu(  l  a'), 

where  t  is  the  iteration  index  and  <i  is  a  decay  in  the 
i .in;;**  0.  1  ,  to  obtain  curves  such  as  those  illustrated  in 

Fig.  3. 


5.  An  Example 


For  a  concrete  example  of  a  concurrent  mult. level  re¬ 
laxation  process,  consider  the  regularization  'motion  of 
the  surface  reconstruction  problem  using  a  thin  plate 
surface  spline  »see  Teriopoulos,  1983;  A  laiobi  ver¬ 
sion  of  the  relaxation  formula  that  was  obtained  for  an 
intmor  node  of  the  level  l  grid  is  given  by 

1  »  (v'.  »  J  *  v! 

r  (v. - i.i - 1  —  — i - 1  -  v.  - 1 . » - 1  -  vi * i .) •  i) 

>  (*1-2.,  -  v! .,-2  *  ¥1.,  ~i) 

*  f  t'lVa',  j  (»  -  (*')»«') . 

where  ti*  is  the  constraint  parameter  associated  with  the 
scattered  depth  const  rami s  dj  f  input  mi  level  /. 

The  above  formula  may  hr  used  in  the  concurrent 
multilevel  relaxation  process 

v,/*"-  R(*C 

where  I  is  thr  iler.ilioo  index,  in  conjunction  with  a 
-■mplr  coari-to-line  coupling  term 

**(«/!  -ti*  J  ,  );  for  t,j  odd,  and  *cro  otherwise, 

r-  r 

and  the  analoRou.  fine- lo-coarse  coupling  term 
-  vj, for  all  i,j. 

'The  effect  of  these  strictly  local  expressions  can  be 
interpreted  physically  as  unilateral  springs  with  stiff¬ 
nesses  K1  .ami  fi1 ,  which  couple  a*  coincident  grid  coor¬ 
dinates  the  thin  plate  splines  on  mijarcnl  levels  (imagine 
the  vertical  connections  in  Fig  1  to  he  the  springs).  The 
natural  smoothing  of  the  relaxation  process  permits  the 
even-numbered  nodes  to  remain  uncoupled  from  the  ad¬ 
jacent  coarser  level. 

The  reader  should  bear  in  mind,  that  analogous 
relaxation  formulas  result  from  more  .;«»  phis  tic  ated  sur¬ 
face  models  which  allow  arbitrary  discontinuity  bound¬ 
aries  to  be  introduced  on  the  grids  [Tcrzopoulos,  1985; 
198Gb  ,  as  well  as  from  related  image  analysis  problems 
Terzopoulos,  lOHGai. 


0.  Summary  and  Diacussion 


F  gore  ,T. 


•  ’I  !:»•  ittngnil  nd*  <•!  the  «i»u- 

•  g  *  !i<  t  «  r.it  *  vr  pr««.  f*n« 


r  have  develop*  d  .i  new  m-’dt  drvrl  relaxation  strategy 
'hat  e\p|oii~  a  ['ii.iMr  degn  e  p.«i  alU  1.- in.  In  con- 


»ra.**t  U)  convent  inual.  rctur-uvr  coordination  schemes, 
the  i»«  w  coord  motion  "t  rat  egy  is  ftiily  .'oncurrrnl :  it 
maintain*  proce-^or*  on  .til  level*  bu*y  pi •'^ornnrit;  -;- 
nui i taiirosjs  rri.1x.4t  son  ojmtoI  ions 

The  concurrent  rooniiti.it ion  *ir.itegy  aims  to  op¬ 
timize  a  umit  lievrl  obje<  t  ivr  f.iiu  t  lon.d.  e.ich  of  whose 
ti  rin'*  ha*  iliriT  roinpmi*  tit.-  ilia  discrete  version  of 
the  ijivrii  functional  on  r.n  h  lev«|  oi  a  umllignd  hierar- 
<h>.  ; J ,  an  add ) l jvc  functional  coupling  each  level  (ex¬ 
cept  the  hue-*,  to  thr  next  liner  level,  .uid  '^3j  .ui  addi- 

•  :\e  functional  coupling  rath  level  (except  the1  coarsest) 
to  the  next  coarser  level. 

The  inter  level  coupling  functional*  are  designed  »o 
that  the  '•chime  will  lie  convergent  They  .nvolve  cou- 
I j !  1 1 ; •  *  factor*  that  are  muddied  during  tiie  iterative  pro- 
c r  > ■>  mu  h  tli.il  there  i*  an  initially  :»rop.g  l*nt  gr.tdu- 
.dlv  weakening  coir*e-to  line  interaction,  which  occel- 
cmivcn'ence.  and  an  mi!  tally  weak  hut  gradu¬ 
ally  m  rrngf  hemng  hue- to*  coarse  mt<  raft  ion.  which  ulti¬ 
mately  yields  consistent  accuracy  on  all  levels. 

We  have  implemented  a  concurrent  muingrid  algo¬ 
rithm  for  the  problem  of  computing  v  i-uble -surface  rep¬ 
resentations  a*  formulated  in  Teriopouhn,  1085  Pre¬ 
liminary  experiments  with  the  algorithm  are  encourag¬ 
ing,  The  cflic'cncy  of  the  concurrent  algorithm  on  a 
uniprocessor  is  comparable  to  that  of  its  conventional 
nriltigrid  counterpart  (the  efficiency  of  the  latter  L* 
studied  in  Terzopoulos,  108*1  ).  We  also  noted  that  the 
concurrent  sorithm  is  significantly  easier  to  implement 
than  its  conventional  counterpart. 

Among  the  many  issues  that  beckon  further  study 

are: 

•  The  implementation  of  concurrent  multigrid  schemes 
in  parallel  hardware.  Some  analysis  of  the  implica¬ 
tions  can  be  found  in  dirondt,  1981]  and  .Gannon 
and  UoscmJalc,  1982). 

•  The  investigation  of  coupling  functional  that  arc 
not  based  <>n  common  norms.  Thi*  will  lead  to 

more  sophisticated  methods  for  fusing  information 
across  scales. 

•  'flic  use  of  more  complicated  inlerlrvel  mappings 
in  the  coupling  functional.  This  will  suggest,  for 
example,  nontrivial  ways  of  treating  discord inuiticj 
across  scales. 

•  Reformulating  the  com  urrent  multigri.l  approach 
to  generate  di- 1 1  ihu ted  mult {resolution  representa¬ 
tion-.,  "lull  a**  the  relative  depth  representations 

ugge-ied  in  I V r/opou lo->,  l').s|  One  possibdity, 


wta.'fi  diavv*  from  Uie  pirvimts  point,  is  ti»  u*e  .n- 
fer  le  v  i  I  .oap|*ing>  l  I1.1t  1  ,ai*» I  it  •<  1 »  si ;ghl »  d  *u ms  of 
uoi i .4 )  variawi**-*  on  a  .*«  ru*  of  coarser  (1  v  Is  (along 
vvitii  irsi.ii. al  input  data  different  ra  .ts  o.  » -t  1  ,unt*} . 


Acknowledgement* 

I  thank  Michael  Brady  for  challenging  me  to  develop  a 
concurrent  multilevel  relaxation  algorithm  for  vision. 

Thi?  report  describes  research  done  at  the  Arti¬ 
ficial  Intelligence  Laboratory  of  the  Massachusetts  In¬ 
stitute  of  Technology.  Support  for  the  laboratory’s 
Artificial  Intelligence  research  is  provided  in  part  oy 
the  Advanced  Research  Projects  Agency  of  the  Depart¬ 
ment  of  Defense  under  Office  of  Naval  Research  contract 
NOOOi  1-80  0-0505  and  the  System  Development  Foun¬ 
dation. 


References 

Ilatrher*  K.E.,  I9M)  ,  “Design  ol  a  massively  parallel 
processor,”  iKKE  Trans.  Computers ,  C-20. 

Brandt,  A.,  [1977’,  “Multi-level  ;iduplive  solution?  to 
boundary-value  problems,”  Math.  Comp.,  31,  333- 
390. 

Brandt,  A.,  [1981;,  “Multigrid  solvers  on  parallel 

computers, ”  Elliptic  Emblem  Solver  ,  MUI.  Schultz 
(ed  ),  Academic  Press,  New  York,  -o?5. 

Gannon,  D.,  and  Rosondalc,  J.V.,  [1982:,  Highly 
parallel  multigrid  solvers  for  elliptic  PDFs:  An  ex¬ 
perimental  analysis,  1GASF,  NASA  Langley  Re¬ 
search  Center,  Hampton,  VA,  ICASK  Report  82- 
36. 

Glazer,  F.,  1981;,  “Multilevel  relaxation  in  low  level 
computer  vision,”  Multxre. solution  Irnaye  Process- 
iruj  and  Analysis,  A.  Rosenfeld  (ed.),  Springer 
Virlag,  New  York.  312  330. 


Harkbusch,  W.,  and  Trottenberg,  U.,  (rd.), 

1982  ,  .Ifudigrid  Methods,  Lecture  Notes  in  Math- 
rmaties,  Vo!  960,  Springer  -Verlag,  New  York. 

Hillis.  W  D.,  1985  ,  The  connection  machine,  Ph  D. 
tliesi*,  Department  of  Electrical  Engineering  and 
Computer  Science,  MIT,  Cambridge,  MA. 

Kuo,  C.C.,  1985  ,  Parallel  algorithm*  and  architec¬ 

ture*  for  solving  elliptic  partial  differential  equa¬ 
tions,  MIT  Lab  for  Information  and  Decision  Sy»- 
tein*,  Cambridge,  MA,  LIDS  TH-1432. 

Routnfeld,  A.  led  ),  198*',  Mtsltirtsointion  Image 

i'rocesstng  and  Analysts,  Springer- Verlag,  New 
York. 

Terzopouioa,  D.,  1983),  ‘Multilevel  computational 

processes  for  visual  surface  reconstruction,"  Com¬ 
puter  Virion,  Graphics,  and  Image  Processing,  24, 
52  96. 

Tcrzopouloa,  D.,  1984  ,  Muitir*solu»ion  computation 
of  visible-surface  representation*,  Ph  D.  thesis,  De¬ 
partment  of  Electrical  Engineering  and  Computer 
Science,  MIT,  Cambridge,  MA. 

Terzopoulos,  D.,  ;  1 985! ,  Computing  visibic-surface 
representations,  MIT  A]  I.ab.,  Cambridge,  MA,  AI 
Memo  No.  800. 

Tcrzopouloa,  D.,  >198Ca),  ‘Image  analysis  using 

multigrid  relaxation  methods,"  IEEE  Trans.  Pat¬ 
ter. t.  Analysis  and  Machine  Intelligence,  PAMI-8, 
to  appear. 

Tcrzopouloa,  D.,  1 2 D8Gb] ,  "Regulariialion  of  inverse 
visual  problems  involving  discontinuities,”  IEEE 
Trans .  Pattern  Analysis  and  Machine  Intelligence, 
PAMI-8,  to  appear. 


gogigimg  zz 


T rinocular  Vision 

Using  Photometric  and  Edge  Orientation  Constraints’ 


Victor  J.  MilenKovic  and  Takco  Kanade 

Department  of  Compu*e*  Science 
Cdrne<j»e  Motion  University 
Pittsburgh  Pennsylvania 


Abstract 

Tnr.ccu.'ar  v.s* on  .3  Gtercc  uS-n-j  r^.rce  ncn  cc'Unear  views  't  r.as 
fy^rn  shown  in  sne  liter jture  mjt  .1  :hir»J  view  ,n<ls  »n  the  selection 
r>f  matching  pairs  of  edge  Dotnts  from  me  first  two  views  by 
provifl  n-j  a  constraint  on  the  posifons  ol  the  points  in  addition  to 
this  pos.iionai  constraint  this  paper  proposes  two  new  constraint 
pf monies  tor  use  in  rjeiermininy  the  set  of  correct  matches  The 
fast  principle  constrains  the  or  notations  of  the  matched  edge 
pi « e  s  «jnij  the  second  prmcofe  constrains  t*e  «niage  intensity 
•  atui.-s  in  the  regions  surrounding  me  edge  p**e<s  ‘StatistscaJ 
confidence  measures  and  rejection  fhresho'ds  are  denv**d  from 
these  ‘"onstramt  principles  in  order  to  maurmre  the  number  of 
correct  mjtches  in  the  presence  Of  error  An  t/  nocuiar  stereo 
algorithm  based  on  these  principles  is  descr.oed  and  applied  to 
synthetic  and  real  images  with  good  results. 


1 .  Introduction 

A  typical  method  of  binocular  stereo  vision  has  three  stages. 
First,  the  vision  system  extracts  edges  from  gray  level  imaqes 
token  from  tv/o  known  viewpoints.  Next,  the  system  attempts  to 
match  edge  pixels  m  one  image  with  edge  puels  m  the  other 
image  in  order  to  determine  the-  depth  of  each  edge  point.  Finally, 
some  form  of  surface  fitting  extends  the  sparse  depth  data  to  the 
entire  image.  One  of  the  most  difficult  tas*s  m  this  method  is  to 
disambiguate  multiple  matches  m  the  second  stage.  A  number  0/ 
papers  have  shown  that  adding  3  third  view  of  the  same  scene  can 
greatly  serve  to  eliminate  incorrect  matches  The  extra  view  0 
provides  a  geometric  constraint  on  the  position  of  edge  pixels 
(4j  and  13]  or  on  the  positions  of  other  features] I ].  For 
correlation  based  matching,  it  has  been  shown  that  a  measure 
based  on  multiple  images  has  much  higher  and  belter  defined 
peaks  [7]  than  that  of  a  two  image  coi  relation. 

In  this  paper,  two  new  constraints  on  the  matched  edge  points 
are  proposed.  The  fiisl  is  a  geometric  constraint  on  the 
orientations  of  the  edges  which  holdo  only  ‘when  the  number  cf 
cameras  is  greater  than  two.  The  second  new  principle  proposed 
is  a  photometric  constraint  on  the  intensities  of  light  measured  ;n 


*  research  was  t.po»icof«J  by  the  DHpftfj®  Ar)*,mce<J  Research  Project* 
Aqrr.Cy  (CCD)  APPA  Gr  ief  No  33S7.  arm  monaowj  by  lh*»  Apr  Force  Avicw-'C* 
Lat'OMt'jry  umJf*r  Cootrort  ('23615  78  C  IS31  lh<«  vid  r oncluS'Oos  in  tin* 

jo  .>!**•' * »l  tl»o«*:  ij  anther (*)  j r:J  viCuM  not  L*  w-lf  *pret«  0  .li  repferj.ntinq 
ti  n  "...t.ji  ; >o ' ic i o*>  r  f'/ew'd  or  .0  |.l '-<J.  ol  *h*5  OGnoc  Adnincrd 

P<‘  •  ■  ; i  cm  r i ejects  Aq-  fioy  ij  ir  <.  'j  .S  Govi'n'Wnl 


the  vicinity  j I  the  edge  poiets  To  provide  insensitivity  to  none, 
these  constraints  are  transformed  into  statistical  confidence 
measures  These  confidence  measures  are  combined  with  the 
geometric  position  constraint  used  by  the  other  binocular 
methods  to  create  a  new  matching  algorithm  The  use  o< 
statistical  methods  enables  the  matching  algorithm  tu  detect 
cases  of  occlusion  and  also  to  identity  edges  ansmg  Irom 
occluding  contours. 

The  tnnocutar  matching  algorithm  is  demonstrated  on  both 
synthetic  and  real  data,  tl  performs  as  well  as  a  good  binocular 
scheme  and  also  matches  horizontal  edges  The  algorithm  works 
very  well  despite  the  tact  that  >t  uses  none  of  the  h.gher  level 
assumptions  that  binocular  scheme?  must  use  In  general, 
binocular  methods  require  an  assumption  of  depth  continuity  as 
well  os  other  constraints  on  .he  dispanty  range  in  order  to  reduce 
the  number  ol  candidate  matches  In  order  to  exploit  these 
assumptions,  computationally  expensive  methods  such  as 
relaxation  [3],  dynamic  programming  [6|.  or  multiple  resolution 
matching  must  be  used  The  tnnocutar  method  demonstrated 
here  does  not  require  these  assumptions;  it  loc.  only  at  individual 
edge  pixel3  and  thus  is  computationally  efficient. 

2.  Constraint  Principles 

A  trmocular  matching  algorithm  can  use  at  least  three  constraint 
principles  to  discriminate  correct  matches  from  incorrect  ones. 
Two  are  geometric  constraints  on  the  edge  pixel  positions  and 
orientations.  The  third  is  photometric  constraint  on  the  image 
intensity  near  tho  edge  The  efficiency  of  the  algorithm  depends 
on  the  lust  position  constraint,  wmch  is  in  tact  the  strongest  ol  the 
three.  Tne  other  two  constraints  provide  additional  support  for 
correct  matches.  This  section  introduces  the  geometric  and 
photometric  principles  behind  these  constraints,  and  section  3 
describes  the  statistical  techniques  needed  in  order  to  exploit  the 
edge  orientation  and  image  intensity  constraints  in  the  presence 
ot  noise. 

In  order  to  maximize  the  additional  information  it  provides,  the 
third  camera  should  not  lie  collmear  with  the  other  two  cameras. 
Optimally,  the  three  cameras  are  positioned  in  an  equilateral 
triangle  Two  of  the  cameras,  called  loll  and  right  are  set  up  side 
by  side  m  the  standard  arrargement  lor  two  camera  stereo.  The 
third  camera,  the  up  camera,  sits  between  the  first  two,  above  or 
below  them  to  complete  the  triangle.  For  simplicity  of 
calculations,  the  cameras  should  all  be  directed  parallel  to  each 
other. 


2.1  Position  Constant 

The  trmocuiar  position  .-onst'amt  s  a  impie  exter.s;on  of  tho 
one  whicn  exists  tor  the  binocular  w  atch’-ng  algorithm  A  given 
edge  p»/ei  -n  the  ieh  image  r^n  match  ;;ni/  edge  p»*e<?  *n  the  r>ght 
•mage  wmch  ;»e  on  an  ecteo!uf  i*n«»  £  nee  th*  left  and  the  right 
r-.amnns  differ  >n  hcri  rental  oos  t.^n  on  y  the  epi£>oi*ir  nes  are 
tr.e  scan  .mes  of  the  'mages  But  the  constraint  pfinc.plo 

hr*ds  for  the  left  and  up  cameras  and  tho  nrht  ana  up  cameras  In 
(hose  cases  the  epipoiar  :«»os  do  nut  conesoord  to  scan  lines 
fo-cauM  tho/  aie  inclined  ty  CO  and  120  degrees  but  the  ep^poUr 
constraint  holds  nevertheless  ’*  ne  two  m.ea  meet  m  ihc  up  image 
ut  an  angle  of  60  degrees  anc  therefore  a>w?yt  do'eumne  an 
unambiguous  location  tor  the  thirv  eOge  pixef  m  a  mate  htd  ’.npte 
Ordinarily,  an  edge  pixel  *r.  the  {e«t  image  and  an  eogr  p.*el  m  tho 
right  image  cannot  match  unless  a  corresponding  ^cqe  pixel 
exists  at  the  cor: eel  location  in  the  up  image,  and  thus  the  thi.d 
•mage  provides  a  geometric  position  constraint  nor  avaiJabio  tc  tf*e 
bmocuiar  algorithm  Trie  tnnocular  algorithm  requires  iittto  or  no 
s^-.ircn  m  the  up  image,  and  thus  it  is  roughly  as  efficient  at  tho 
simplest  binocular  algorithm. 

2.2  Orientation  Constraint 

An  edge  dciection  ?.lgonthm  '■an  respond  only  to  charge**  m 
iritensity  in  the  image  These  intensity  change,  can  hive  a 
number  ol  causes  all  having  to  do  with  changes  in  the  surtaca  ol 
the  object  as  seen  by  the  camera. 

•  The  surface  may  have  an  abrupt  change  in  reflectivity, 
such  as  at  the  boundary  ol  two  regions  with  ditlereni 
colors. 


2.3  Photometric  Constraint 

As  has  heen  remar.eci  Jbove  not  all  edge  pixels  are  uselul  lor 
purposes  ol  matching  In  order  to  be  uselul,  an  edge  p.xel  nust 
correspond  to  a  pomt  on  some  viewpoint  independent  contour, 
such  as  the  boundary  Detween  surfaces  ol  differing  reflectivity  or 
surface  normal  Cf  course,  one  of  the  surfaces  may  not  be  visible 
as  m  the  case  ct  an  occluding  contour,  but  at  least  one.  and 
u-.uallv  both,  ol  the  surfaces  meeting  tne  contour  are  visible 
from  all  three  cameras  In  o»neral  an  edge  u-  el  spins  the  region 
nearby  m  the  image  mto  two  pans,  one  cm  each  side  ol  'he  edge. 
One  s»oe  has  somewhat  depressed  intensity,  the  other  somewhat 
elevated  intensity.  These  oarer  and  vgrner  snJes  correspond  to 
the  surfaces  ol  different  orientation  or  reflectivity  on  the  ooject 
that  meet  it  the  u>.-.Cv.•  that  cauir-s  the  edge  pixel.  Moreover,  the 
darner  nae  regions  o I  matching  edge  pixels  bom  different  HMges 
will  correspond  to  the  same  surface  on  the  object  The  lighter  s«0e 
red  ans  will  correspond  to  another  common  surface. 

Suppose  we  samp*  the  images  near  edge  pixels  which  are 
suppose  t  to  match  As  m  figure  2  2.  the  i  sample  points  ex'  the 
dprv^r  side  ol  tne  pixel  m  the  left  mage  have  intensi'ies 
r  I  l(  tjie  sample  Points  on  the  lighter  side  of  left  ixx.'-l 

have  intensities  I  I  l4  .  and  so  on  Oetmt  the  mean?  of 

the  sampled  pents  as  follows. 


r 


•  The  surface  normal  may  ne  discontinuous  as  at  a 
polyhedral  edge. 

•  The  surface  may  vamsn  tr  cause  it  is  occluded  by 
another  surface  or  it  cc.'udes  itself. 

In  general  these  conditions  can  occur  in  any  r  ombmatinn.  but  if 
there  is  no  reflectivity  or  surface  normal  rfiscontincty.  the 
resuiliny  edges  depend-or.  Ihj  viewpoint  and  thus  ire  useless  lor 
matching  II  cither  of  the  first  two  conditions  hold,  than  there  is  a 
viewpoint  independent  curve  m  space  which  is  the  boundary  of 
discontinuity 

The  edge  pixels  in  an  image  correspond  tr  points  on  the  contour 
of  discontinuity.  The  measured  or  entatu  .  of  the  edges  are  the 
projections  ol  the  tangent  lines  to  the  contc  i  \  Suppose  the  edge 
orientation  vectors  Q,,  Q^,  and  Qr.  are  proier.iions  ol  the 
contour  tangent  vector  v  as  in  figure  2  t.  Let  the  vector 
displacements  Irom  camera  to  imago  plane  point  be  D; .  D^.  and 
D(..  The  vector  v  must  lio  in  the  plane  determined  by  O  and  , 
the  plane  determined  by  0ff  and  QR ,  and  the  plane  determined  by 
0 r  and  Q(.  Therefore  the  normal  vectors  to  these  three  planes. 
M; ,  N^, ,  and  N(.  must  I.J  in  the  plane  perpendicular  to  v,  and  thus 
the  triple  product  ol  these  three  vectors  must  bo  zero. 

O.xQ,  0  _xQ  O.xQ,. 

H  =  —L - N  -  —1 - 1  N  =  — _ L'_ 

'■  |D,xQ4r  *  iO^xQ^I  ID^.xQ^I 

[ff(  Nfc„Nc|  =  0  (triple  product). 


x;-;z:r  :r 

t  »  I  /  -M 


xr-lL*.’ 


It  the  darker  sides  to  indeed  correspond  to  the  same  surface  on 
the  obiect.  then  one  would  expect  thu  means  yi;,  n^,  to  be 
appr  tely  equal  Similarly,  one  would  expect  the  lighter  side 
means  .  be  cfo^a  in  value. 

The  photomotric  constraint  requires  more  assumptions  than  tho 
geometric  constraints,  and  these  assumptions  are  couched  in 
approximate  terms  rather  *han  exact  equalities.  Tho  photometric 
constraint  is  therefore  the  weakest,  yet  it  is  still  quito  uselul.  If  one 
assumes  that. 

•  the  surface  on  each  side  ol  the  edge  is  indeed  visiblo 
from  all  three  cameras. 

•  file  surfaces  have  louyhly  homogeneous  values  of 
lolfectivily  and  surface  normal  away  from  the 
discontinuity, 

•  the  observed  intensity  of  a  point  does  not  change 
appreciably  with  camera  location, 


:• 


tho  one  would  expect  to  see  a  close  match  among  tho  darker  side 
meaos  and  the  lighter  side  means 


*'*VT-V.'T  '■*'  ■?*.*  '-'"T'T* 


i  <_ii  g  ■  .  «  J 


>■.  *g*se  f„>»  t:  yjj'j  ww  w* 


!PJ  H-1R 


rigur«2-t:  Camera  Geometry  (or  Edge  Orientation 


Figure  2-2;  Sampling  Images  for  Photometric  Constraint 


3.  Statistical  Matching 

if  the  two  geometric  constraints  and  one  photometric  constraint 
heic  perfectly,  then  a  matching  algorithm  could  easily  determine 
which  candidate  triples  of  edges  were  correct  matches  and  which 
were  net  Unfortunately,  'here  are  a  number  of  sources  of  error 
that  ran  cause  a  ccrrecf  match  to  only  approximately  satisfy  the 
geometric  and  phoio.netnc  constraints,  and  these  errors  can  also 
cause  a  non  matching  triple  to  nearly  satisfy  these  constraints  aod 
thus  appear  like  a  co'rect  match.  One  error  source  is  the  physical 
limitations  ol  the  equipment  such  as  finite  digitization  will  can  alter 
measured  values  Occluding  contours  if  they  arise  from  curved 
surfaces,  and  specular  reflection  create  features  that  depend  on 
the  camera  position  and  are  therefore  not  matchable  using  the 
methods  of  this  paper  The  photometric  constraint  a  based  on  a 
simple  lighting  and  surface  reflectance  model  Cases  where  these 
s-mpiifying  assumptions  do  not  hold  can  have  model  ate  to  large 
deviations  from  expected  intensity  values. 

Even  given  these  limitations,  we  can  still  maximize  the  number  of 
correct  matches  by  relying  an  a  statistical  algorithm.  The 
following  steps  are  necessary  to  generate  suen  an  algorithm: 

•  Assume  the  errors  satisfy  a  known  statistical 
distnbution. 

•  Derive  confidence  measures  that  can  be  applied  tc 
evaluate  hypothesized  matches  and  to  choose  the 
best  ol  a  group  ol  competing  matches. 

•  Derive  also  failure  thresholds  which  indicate  which 
values  ol  the  confidence  measure  are  too  low  to 
correspond  to  correct  matches. 

In  case  ol  multiple  candidate  matches  tor  a  feature,  the  algorithm 
chooses  the  one  with  the  best  confidence  measure.  Then  it 
compares  tins  measure  aga'nst  a  failure  threshold  to  determine  if 
it  should  select  this  candidate  as  a  correct  match. 

Sections  3  I  and  3.2  describe  the  general  method  (or  the 
derivation  of  conlidence  measures  and  lailure  thresholds.  The 
following  cocoons  3  3  and  3  4  describe  the  specific  conlidence 
measures  and  failute  thresholds  used  (or  the  edge  orientation 
constraint  and  the  local  image  intensity  constraint. 

3.1  Confidence  Measures 

In  order  to  describe  the  derivation  oi  the  statistical  confidence 
measure,  o  t  us  use  the  following  notation: 

•  The  set  uf  properties  on  the  object  will  be  denoted  by 
F.  For  the  edge  orientation  cons'  amt,  the  relevant 
property  will  be  the  position  ol  the  contour  point  on 
Pc  object  and  the  tangent  line  to  the  contour  at  that 
point  For  the  photometric  constraint,  the  property- 
will  be  the  light  intensity  reflected  from  a  small  patch 
Of  ubicc:  SuiiuCO. 

•  The  set  of  values  of  observed  features  will  be  denoted 
by  F  (actually  F/ ,  F*.  F(  lor  the  three  images)  These 
feature  values  will  be  either  edge  orientation  or  image 
intensity. 

In  the  lirst  place,  the  set  of  obiec!  prooerties  /><.  P  must  satisfy 
some  statistical  distribution  Op(  p)  For  example,  we  may  know 
that  the  tangent  line  orientations  are  often  distributed  in  the 
vertical  and  horizontal  directions  because  o’  gravity.  Secondly, 


163 


we  must  Know  the  distribution  of  possible  image  feature  values. 
f  f  F1  m  the  case  of  the  left  image,  which  can  result  as  the 
projection  of  a  given  obiect  property  pt  P  Ir.  the  ideal  case.  the»e 
will  be  no  error  in  the  imaging  process  and  thus  only  one  feature 
value  J  1  will  correspond  to  an  object  value  p.  In  genera),  however, 
errors  and  noise  will  spread  the  observed  values  into  a  distribution 
OpF(  Pi')  The  notation  is  similar  for  ihe  right  and  up  images. 

By  combining  the  d.stribuhon  of  object  properties  with  the 
dis'ributions  of  feature  values  m  each  image  corresponding  to  a 
given  property  value,  we  can  derive  the  dist'ibution  for  the  set  of 
property  values  and  corresponding  feature  values, 

p)  oL?f  to/')  ,.f*\  nLfri  P.f  v). 

This  combined  distribution  is  not  useful  because  we  cannot  know 
the  value  ol  the  object  property  n.  and  we  can  only  directly 
observe  the  values  fL.  Thus  we  integrate  over  pcP  to 
obtain  ;he  distribution  ol  observed  feature  values, 

CV' /*/')=  /  W p. /'■■/*. /“). 

pi? 

This  likelyhood  of  observed  feature  value*  is  the  confidence 
measure  we  are  seeking. 

Suppose  mat  for  z  feature  in  the  left  inage  with  value  fL.  we 
have  two  hypothesized  matches  .( and  H  with  feature  c'ue  triples. 

Let, 

n=^Wt,fva). 


Given  that  we  must  choose  between  A  and  D  as  the  correct  match, 
if  a^p.  we  should  choose  A  bo:ause.  on  average.  «/(«  +  /}) 
tract  jn  of  the  time,  it  will  indeed  be  the  correct  match  Thus  we 
will  make  the  correct  choice  that  fraction  ol  the  lime 

3.2  Failure  Threshold 

So  far  ihe  analysis  has  shown  us  how  to  choose  between 
competing  candidate  matches  We  woukj  also  like  to  know  it  the 
match  with  the  best  confidence  value  is  actually  likely  to  be 
correct.  Some  sort  of  fa. lure  threshold  for  the  convenes  value  is 
necessary.  By  definition  an  incorrectly  hypothesized  match 
corresponds  to  three  fe;  es  which  are  generated  by  different 
parts  of  the  object  and  are  meretofe  statistically  independent.  The 
distribution  of  feature  values/ 1  generated  independently  is. 

?( p 

if  the  featu'e  values  a'e  statrtically  independent,  the  distribution 
lor  an  triple  < f1./11./1  >  is  the  product  of  the  individual 
dictributions. 


a'suV  ■/  *  / J '  >  /j;nd(/s  >  ^a'fuy 


Suppose  correct  matches  constitute  fraction  p!r  of  the  set  of 
hypothesized  matches,  and  suppose  false  matches  constitute 
fraction  p ,4lM, 


Leb 

T=  P^/u(fL.M 

^=p,  e.O/V'A 

If  T>  F  then  the  feature  triple  is  more  likefy  to  have  arisen  trom  a 
s-ngie  object  property,  it  T <■  F,  the  features  values  are  more  likefy 
fo  nave  arrsen  ’ndependen'ly  Thus  *n  order  to  select  correct 
matches,  wj  should  compare  the  confidence  measure  of  a  match, 
Dlf  (fL  fk.fL).  to  a  failure  threshold. 


3.3  Edge  Orientation  Constraint 

For  the  case  of  the  geometric  edge  orientation  constraint,  the 
feature  values  are  the  angles  <8 1  8 H  >  o‘.  the  edge  pixels  m  the 
three  images.  The  object  property  is  the  orientation  of  the  tangent 
line  to  the  contour  curve  point  on  the  obiect  vhich  generates  the 
edge  p  .vets  The  value  of  the  property  is  expressed  as  a  point  v  on 
the  Gaussian  sphere  Let  0^  be  a  vector  which  points  along  tru 
ray  from  Ihe  camera  cenler  to  the  edge  pixel  and  corresponding 
contour  curve  point  as  m  figure  2  1  Let. 


Q;  =  (costf ,  ,5in0 ,  0). 

bo  a  vjelor  m  the  image  plane  aligned  with  the  orientation  of  Lhe 
edge  pixel  (Q;  and  8 ,  can  be  used  interchangeably)  The  vectors 
0 ,  ,  Q  .  and  v  must  lie  in  a  plane,  and  therefore, 

L  t. 

1°/.  3t-*l  =  0  (triple  product) , 

Q^-zx(0;kO  ).  where  x  =  (0.0.1). 

fhe  tangent  line  orientation  v  determines  Q;  because  is  also 
constrained  to  lie  m  the  image  plane  In  general  the  values  of  both 
Q,  and  C*  a-e  necessary  and  sufficient  to  deteimme  v. 

V  =  (D^Q.jxfD^xO^) 
v  =  V/|V|. 

These  relctionshics  noid  exactly  on'y  in  the  ideal  case,  but 
whatever  form  the  distribution  /.’pt(v  0 , )  (the  probability  the  the 
image  of  v  n  the  lett  image  has  orientation  8  L)  takes,  it  will  have  a 
peak  at  the  value  of  0(  lhat  corresponds  to, 

Q;  =  zxfD^xv). 

3.3.1  Ideal  Two  Camera  Distribution 
Let  us  civume  that  /)fiv)  'S  a  uniform  O'Slnbuliun  on  the 
Gaussian  sphere.  In  other  words,  all  contour  tangent  line 
directions  are  equally  likely.  In  the  ideal  case,  the  distribution 
Opf(v.Q/ )  'S  a  deita  (unction  about, 


°t  =  *x(  0LxOL). 


I'. . .  ■.  JI.  w  II  j v ■  :■  .1,  —  ■  WW  »  1 1,,  p 


■'it 


and  each  v  prefects  to  unique  ■  alues  ol  0/  .  QA. .  and  Q,  n  the 
three  images  Nevertheless,  as  has  Deen  shown  in  [2],  the  set  of 
pairs  <0,0  >  do  not  form  a  uniform  distribution  even  under 
these  ideal  ci.cumstances  It  s  much  more  hueiy  that  matching 
edges  .o  the  right  and  the  left  images  have  the  some  one-uahon 
than  it  is  that  they  have  widely  different  orientations  The 
following  is  a  derivat-on  of  a  formula  for  this  distfbution 
/>pf(Q,  I  which  is  simpner  and  more  efficient  to  compute  than 
that  given  >n  (?)  The  value  of  the  distribution  s  the  ratio  bo 
between  the  urea  of  a  small  region  about  v  v-ri  th  y  Gaussian 
sphere  to  the  area  of  the  protection  of  that  region  into  the  space  of 
orientation  pairs,  (t ,  6  „>  Tms  ratio  is  the  area  spanned  by  v 
under  infinitesimal  changes  in  0 ;  jno  <#asin  figure  3-1. 


V  -  tD.xQ,  V-iO^Q  )  (1). 


v 

v  -  —  (umt  direction  oi  tangent). 

,  ir  „  „  I  ,  dv  dv 
7  prlQ,'  Q*  ’  ::  4 w'ao'aS? 


The  ractor  of  !•**)'  normalires  the  result  so  lhat  t^e  integral  ol  the 
distribution  over  all  pairs  of  angles  is  unity. 


Because  v  is  ol  unt  length  and  is  perpendicular  to 
dv.'dtf  and  dv/d 3  ^  the  area  cl  the  paraiie’^gram  tormed  by 
dvAl 0,  and  dv/oif^.  equals  the  volume  of  the  paralie'cpiped 
formed  by  v  dv/dtf;  and  av.'dS  h  The  volume  of  a  parallelop'ped 
can  be  expressed  as  a  tuple  product. 


3-l-3+ 

Taking  d*rt vattves  and  substituting  into  this  formula. 


dv  ,  dV  dlVl" 

-  V  VI  -  W  V - — 

as  as  as 


dv 


=  ;vi 


..  r)V  djy£2 
a0M  +  1 


:!v  —i|  -  ;vr'!(v  *L  — 

60l  40  /  *  1  00,  40/ 


Therefore, 

/  ff 

J)pFiQ j.Qh)  - 
v/hero, 


7**%%  »• 


dV 

^  O'- 


t)V 

(4>- 


Thus  given  values  for  9  {  and  9 { ,  one  can  obtain  7,  dV/d^/  .  and 
cJV/dtf^.  usmg  equations  (1).  (3).  and  (4).  Fiom  these  vectors, 
equation  (?)  gives  the  value  of  the  distribution. 


Actually.  Arnold  [2]  uses  the  one  d>mons<cnai  distroution  of 
values  of  CV  for  a  gr  an  value  of  Q  ,  .  instead  of  the  two 
G i mens  onjl  distribution  on  <C,  Q^>  This  one-uirrensiOO.ai 
distribut.cn  has  a  similar  formula  and  derivation: 


0 


IF 

ff 


I  Q, 


V  = 


_1_  ,  dv  ,  _l_  _  dv  , 

2»  as iw  as 


if  IV  .  dV  d|V|  ■ , 

0„iO.  Q.i  -  — !—  xtiVI  -h  V— !—! — 

"  L  ••  ’* ; v  as,  as  n 


oj:*tQrQlf,= 


1  dV 
2»|V|'  d9* 


As  can  I.  r  seen  in  figures  3  2  and  3  3,  these  distributions  have 
roughly  the  sp.  _e  shape,  each  favoring  .gual  orientations  in  the 
Two  images  The  two  a*es  in  these  ligures  correspond  to  S L  and 
6 K  ranging  bom  0  to  180  degrees  in  'ive  degree  increments. 

3.3  2  Noivldeal  Three  Camera  'Jlst rrbutlon 
In  general,  the  protections  ^>f  ttie  the  contour  tangent  line  will 
include  some  error,  and  so  /2pflv  Q 1 1  will  not  be  a  delta  lungton 
Yet  we  vnow  that  it  the  edge  peel  orientations  have  a  hiqh 
probability  o!  teing  similar  in  the  meat  case,  they  are  likely  to  be 
simita.'  even  aljer  they  a/e^oerturjed  by  soryie  error.  When  the 
distributions  /)pf(v  ).  l)pf{v.O  p),  and  0 pf(v  fl(.)  are  uniform 
and  bcunoed.  it  •}  m  fact  possible  to  numerically  compute  the 
integral, 

n'fM\s L  s 

j  /)p(v)/2pptv  sl  )ulfm.slt)r)"f(*.tu). 

*(  P 

which  .3  the  hkelyhood  of  a  the  triple  o*  angles,  <0  { .0  R 
F igure  3-4  snows  a  slice  through  this  three  dimensional 
distribution  fof  0 j  equal  to  45  degree,  for  three  camera*  arranged 
m  an  equilateral  triangle  with  unit  Side,  and  for  a  centered  «jb|CCt 
po'pi  t-^n  units  away  from  the  cameras  The  measured  orientation 
m  e.u  h  image  is  assumed  to  lie  within  four  degrees  ol  iho  ideal 
value  with  all  va'ues  within  the  range  berny  equally  likely.  The  two 
axes  correspond  to  0 H  and  0 f  ranging  from  0  to  LSO  degrees  in 
five  degree  increments 

This  three  dimensional  distribution  takes  loo  long  to  compute  to 
be  raact'cally  useful  for  a  matching  algorithm.  It  has,  however, 
following  properties  which  make  an  approximation  possible: 

•  Iho  medial  axis  curve  *s  the  set  of  0 0fJ  which 
sat  sfy  the  error  froo  ca«o. 

•  The  value  c«  !t.e  distribution  on  tha  axis  curve  is 
roughly  equal  m  value  tn  ;he  two  dimensional 
distribution  at  the  same  value  of  0 ,  and  0 R . 

•  A  way  uom  the  axis  curve,  the  drdribution  drops 
rou  ;nly  linearly  to  zero  in  a  distance  of  obout  eight 
dagmes  (twice  the  maximum  arigular  deviation  in 
each  image). 

Usmg  *he5e  facts,  one  can  numerically  approximate  the  correct 
confidence  measure  using  the  formula  for  the  .wo  angle 
chetf billion  (Jerived  above. 


107 


3.3.3  Fails  re  Threshold  lor  the  Edge  Orientation 
Conl.dence  Measure 

For  the  edge  orientation  constraint,  it  is  relatively  easy  to 
compute  the  failure  threshold.  For  edge  pixels  m  one  image,  there 
is  no  reason  to  lavor  any  edge  orientation  over  another  Thereloro 
the  distribution  is  un  ‘orm  on  the  unit  circ'e.  The  distribution  tor  a 
triple  ol  independently  generated  edge  pixels  is  simp.y  me  product 
of  three  uniform  distributions, 


n'*L\a  a  a  i- 

UNO  (9/  9r  eu>  -  i  , 

P*,,. 


For  any  reasonable  values  ol  p  and  p(  .  this  threshold  cuis 
off  the  three  dimensional  distribution  in  figure  3  4  at  a  deviation  of 


twice  the  maximum  angular  error  m  each  image,  in  this  case  eight 


deg.  ess. 


3.4  Photometric  Constraint 

In  section  2  3  we  saw  how  the  photometric  ernstramt  r"rctves 
attempting  to  match  two  sets  of  three  regions.  One  set  arises  from 
the  low  intensity  region  to  one  side  of  the  edge  pixels  in  the  three 
images  The  other  set  arises  from  the  high  intensity  side.  Sample 
values  are  taken  in  each  region  to  generate  the  mean  intensity 
values,  ji, ,  fij.  it’ .  n~ ,  .  The  claim  is  made  that  if  the 

th-ee  edge  pixels  arise  from  the  se.ne  obiect  contour  point,  the 
three  low  values  should  be  rela’.ed  and  the  three  high  values 
should  be  iclated. 


In  sections  3  4  i  and  3  4  2,  we  will  derive  ihe  confidence 
measure  and  failure  threshold  for  matched  regions  in  three 
images  with  k  sample  points  each  In  section  3  4  3.  we  will  see 
how  applying  these  results  to  the  'nple  of  darker  side  regions  and 
to  the  triple  of  lighter  side  regions  of  a  set  of  matched  edge  pixels 
enaoles  us  10  dciect  edges  that  ansa  from  occluding  contours. 


3.4. 1  Confidence  Measure  of  Photometric  Constraint 

The  idual  case  of  matching  three  views  of  the  same  region 
(either  three  darker  side  regions  or  three  lighter  side  regions) 
involves  two  assumptions  The  first  assumption  is  that  the  lighting 
s  atnctly  Lambertian  so  that  any  pomt  on  the  obiect  appears  v/ith 
the  same  intensity  from  any  viewpoint  The  second  assumption  is 
that  the  entire  region  being  sampled  nas  constant  reflectivity. 
Because  the  model  wo  are  using  is  so  simple,  any  deviation  liom 
these  ideal  assumptions  must  be  treated  as  noise  We  will  as  rums 
that  the  noise  conforms  to  a  Gaussian  distribution. 

Interestingly,  these  assumptions  imply  that  we  will  see  a  yrealer 
variance  among  sample  points  taken  from  ihreo  view3  of  II .e  same 
region  then  we  will  see  among  sample  pants  taken  from  one  view 
of  a  region.  The  reason  for  this  difference  is  that  sample  points 
taken  from  one  viewpoint  all  see  the  sftne  deviation  from  the  ideal 
Lambertian  lighting  model  Therefore  only  the  variance  caused  by 
non  uniform  reflectivity  allecti  them.  Sample  pom's  taken  from  all 
Ihreo  images  are  alfectod  by  both  deviation  from  uniform 
reflectivity  and  deviation  from  the  ideal  lighting  model  and  thus 
have  a  larger  variance.  Let  us  denote  the  one  view  variance  by  o1 
and  the  three  view  variance  by 

For  the  three  imago  case,  the  distribution  of  imago  values  I  lor  a 
given  obiect  rcllcctance  intensity  x  is, 

npF(xi)  -  f’rrOj)  j  exp( - — — ). 

The  combined  distribution  for  the  3  k  sample  points  is. 


>r*» 


(2tr0))“exp(-^-(V'(I- if/’-r  V  {X- if  )!u-  ]T(X- if)’))- 

iol  1=1  1=1  ;=; 

integrating  over  all  values  oi  x  and  taking  twice  the  negative 
logarithm  s.-<es  aconhdence  measure, 

i=i  i=i  i=i 

+  (3  A  —  1)  log  +  C, 
where, 


This  confidence  measure  can  be  rewritten  as, 

-VAR  +  — <e ,  +e,4- a  )  +  (3*-l)!.~go,  +  C  (5). 

a  a  *•  *  3 


VAR  =  g- 

n  =  \t'r 


•.-SE«N.* 


^  (i(^“ 


o 


The  confidence  measure  (5)  consists  of  the  sum  el  three  parts. 
The  first  is  proportional  to  the  variance  among  the  three  m.-nr.s 
obtjmed  Irom  the  tlireo  regions;  the  second  part  is  proportional 
the  sum  of  the  variances  seen  in  each  region;  and  tne  third 
depends  only  on  the  choice  of  the  Ihreo  vie*  variance  a (  which  is 
the  same  for  all  matches  Thus  a  good  mat'h  vnll  have  both  low 
variances  in  the  three  regions  and  closely  matching  means. 

3.4.2  Tailure  Thieshold  for  'ho  Photomotrlc  Constraint 
As  has  been  mentioned  above,  we  expect  to  see  a  lower 
variance  among  sample  points  taken  Irom  a  s.nylo  imago.  To 
calculate  Ihe  failure  threshold,  we  must  first  determine  the 
probability  density  fer  t  sample  points  under  this  smaller  variance 
a  Then  we  must  take  the  product  of  densities  lor  the  three 
yroup3  of  k  sample  points.  4  turns  oul  tnat  twice  Ihe  negative 
logarithm  of  this  product  is, 

—  (a,  x  o n  +  <7^,1  +  0*-3)logo]  +  C' 

i 

Hence,  incorrect  hypotheses  can  be  detected  by  comparing  the 
value  ol  VAR,  the  variance  among  the  means,  against  Ihe 
threshold, 

-L^tal.  +  a*  +  aU)  +  C"ia 

a  linear  function  ol  lu«  sum  of  the  variances  in  the  individual 


3.4.3  Detection  of  Occluding  Edges 
The  failure  threshold  can  enable  the  matching  algorithm  to 
determine  1/  a  Inp.e  o'  edge  pixels  has  been  generated  by  an 
occluding  contour  For  each  hypothesized  match,  there  are  two 
photometric  confidence  measures,  one  generated  by  the  three 
damer  regions  on  one  side  cl  the  edge  pixels  and  the  other 
generated  ty  the  three  lighter  regions  on  me  other  side  of  the 
edge  pixels.  In  general,  the  algor. ihm  uses  the  sum  ol  the  two 
confidence  measures  as  the  overall  measure  of  the  goodness  ol 
tt.o  hypothesized  match,  It  one  ind  only  one  side  falls  below  the 
value  of  failure  threshold  for  that  set  ol  intensity  samples,  then  the 
algorithm  hypothesizes  that  e  edge  is  occiudmg  Recall  that  the 
failure  threshold  is  really  a  measure  of  the  confidence  that  the 
three  regions  tire  generated  independently.  Hence  Ihe  overall 
confidence  'or  an  occluding  edge  is  tne  sum  of  the  failure 
threshold  and  the  confidence  measure  lor  the  other  (matching) 
side 


Figuro  3- 1 :  Area  on  Ihe  Gaussian  Sphere  about  v 
Generated  by  Infinitesimal  Changes  m  0  f  and  $ ^ 


Figure  3-3:  Distribution  of  8*  as  a  Function  of  lL 


Figure  3-4:  Slice  through  Distribution  ol  <9  L9  ^  9 y> 


4.  Filtering  Competing  Triples 

We  have  deve'oped  two  confidence  measures  for  the  trinocu'ar 
matching  aloordhm  to  use  in  comparing  competing  matches. 
These  measures  imply  a  partial  order  on  (he  Ukelyhood  of 
maiches.  if  one  ma*ch  has  better  confidence  in  both  categories 
than  another  match,  the  first  is  clear*  the  more  likely  choice.  Two 
matches  are  not  oecessar  !y  orderafcle,  however.  1/  on  has  better 
edge  orientation  confidence  and  the  ether  better  photometric 
confidence.  Oesp'te  (his  possibility,  a  trmocular  stereo  algorithm 
can  obtain  disparities  for  most  of  the  txigu  pixels  by  means  of  the 
following  stops: 

•  Generate  all  triples  ot  edge  pixels  which  satisfy  the 
position  constraint. 

•  F.ltor  out  matcher,  for  each  pirel  which  are  clearly  !esj 
likely  than  another  match  at  that  pixel. 

•  Out  of  the  ret  of  unorr**>rable  match  triples  at  each 
pixel,  remove  those  which  are  not  also  strong 
matches  at  the  other  two  pixels  in  the  triple. 


Figure  3-2:  Distribution  of  <0 .  .0H> 


•  If  possible,  cluster  the  remaining  matches  at  each 
pixel  by  disparity. 


The  geometric  position  constraint  on  pixels  is  subject  to  some 
delation,  out  not  enough  to  maxe  a  statistical  analysis  ot  ti.e  error 
necessary.  Generate  the  set  of  all  matching  tnples  as  follows: 

•  Visit  each  edge  pom!  :n  the  left  image. 

•  For  each  point  visited,  visit  alt  edge  points  in  the  right 
image  within  one  pixel  of  the  corresponding  epioola.' 
line. 

•  For  each  generated  pair  ot  edge  points  from  the  ieft 
and  right  images,  check  in  the  up  image  in  a  one  pixel 
radius  about  the  intersection  ol  the  corresponding 
epipolar  lines.  Every  up  image  edge  point  found 
within  this  radius  generates  a  match  tnpie. 

Suppose  we  have  two  triples  ot  edge  pixels  which  share  the 
same  edge  pixel  In  the  left  image.  I!  they  are  order  able,  meaning 
one  has  better  orientation  confidence  as  well  as  photometric 
confidence,  than  the  worst  of  the  two  is  eliminated  from  the  list  ot 
possible  matches  for  that  left  edge  pixel.  It  may  also  be  beaten  by 
triples  with  which  it  shares  the  same  edge  pixel  in  the  right  or  up 
image.  One  hopes  that  the  maiority  of  triples  will  be  rejected  at  all 
three  location-  and  thus  be  dropped  from  consideration. 

Once  the  algorithm  has  considered  all  matched  triples,  each 
edge  pixel  in  each  image  will  have  a  list  of  unorderable  matches. 
Each  triple  in  the  list  of  a  leit  image  edge  pixel  may  or  may  not 
have  been  rejected  from  the  appropriate  lists  for  the  right  and  up 
image  deafly  it  is  better  that  a  tnpie  be  accepted  in  all  three 
images,  less  good  if  it  has  teen  rejected  once,  and  worst  yet  if  it 
appears  m  a  list  m  only  one  image.  Thus  the  algorithm  has  a  new 
basis  for  comparison  of  the  unorderable  triples  at  a  given  edge 
pixel  It  can  reject  those  triples  in  tho  list  which  have  been 
rejected  more  times  than  the  best  element. 

At  tins  point,  the  algorithm  clusters  unorderable  triples  in  a  given 
list  by  disparity  distance.  It  all  the  triples  agree  to  within  a  few 
pixeis.  than  me  algorithm  taxes  the  average  dispauty  a3  the 
correct  value  It  tho  unorderable  matches  do  not  lie  in  a  single 
cluster,  no  correct  match  can  be  made. 


f .  Performance 

The  trmoc.  far  edge  matching  algorithm  was  applied  to  a  set  of 
synthetic  images  and  two  sets  of  real  images  In  each  case  it 
performed  well,  desoite  tho  lact  that  a  used  no  continuity 
principles  (ne.ghbonng  edge  pixels  she  aid  have  the  same  depth) 
or  other  high  level  information  The  algorithm  also  ran  relatively 
rapidly,  using  no  more  than  live  minutes  on  a  VAX/785  to  process 
the  real  imago. 

5.1  Synthetic  Image 

A  Graphics  system  generated  the  synthetic  image,  it  constats  ol 
a  collection  of  overlapping  'bo*es'’  created  from  rectangular 
plates.  Each  plate  has  a  .vrmHe'  rectangular  region  in  its  center 
with  a  slightly  lower  reMectivity  "ho  lighhng  model  is  Lambertian, 
with  two  sources  o*  light,  one  behind  the  cameras  to  the  upper 
left,  and  one  behind  tho  cameras  to  the  upper  right.  Tho  imago 
was  created  at  a  resolution  of  512  by  512  pixels  and  then  reduced 
b  averaging  to  256  by  256  pixels  The  cameras  ere  arranged  in 
an  eguiiaternl  tnariglo,  and  the  disparity  ranges  from  .’0  to  GO 
pixels.  The  three  images  are  shown  m  ligures  5-1,  o  2.  and  5-3, 
and  -he  edoe  images  m  figures  5  4,  5-5,  5  6. 


In  or  ter  to  test  the  algorithm,  this  imaae  was  deliberated  made 
cluttered  with  bom  occluded  edges  and  occluding  contours.  It 
h.as,  however,  a  number  ol  advantages  over  a  real  imaae.  The 
edge  orientations  and  pos:tiors  are  very  accurate,  and  the  sample 
variances  cn  a  particular  surface  are  very  low.  In  other  words, 

■  ■lost  of  the  assumptions  necessary  for  good  performance  of  t;  e 
confidence  measures  hold  true.  No  assumptions,  however,  were 
made  tbout  the  range  ol  disparities  excep.  thai  points  r.  the  i.oht 
image  were  presumed  to  be  to  the  left  of  points  in  the  left  imago. 
The  algorithm,  used  no  multiple  resolution  or  other  continuity 
techniques. 

The  algorithm  was  run  twice  in  order  to  test  the  detection  of 
occluding  edges.  In  the  first  run,  no  failure  th  nshold  was  used  for 
the  photometric  confidence  measure.  In  the  second  run.  tfi« 
failure  threshold  and  the  scheme  for  detecting  occluding  edges 
was  used.  Figure  5.2  shows  tho  result  ol  the  matching  whete  edge 
points  with  a  larger  calculated  dispanty  ore  represented  by  darker, 
thicxer  lines.  As  expected,  in  the  first  run.  the  aigoritht.*.  failed  to 
match  edges  arising  Irom  occluding  contours.  In  the  second  run 
with  the  failure  threshold,  it  matched  most  of  these  -.viihout  a 
sign  licant  degradation  m  performance  elsewhere  in  the  scene. 
Figure  5.2  shows  ihe  result  al  this  second  run.  and  figure  5.2 
snows  the  correct  d  spanty  map  for  all  unoccluded  edge  points 
(which  couks  bo  calculated  since  these  were  synthetic  images).  In 
both  cases,  the  performance  was  excellent  on  finding  correct 
matches  when  they  existed,  although  the  algorithm  did  tend  to 
accept  an  incorrect  match  wnen  no  correct  match  existed. 


5.2  Real  Images 

Two  sets  of  images  were  taken  in  the  vision  lab  al  CMU  The  fi.-^t 
set  was  generated  by  mounting  the  camera  to  the  slider  of  a 
drawing  table.  By  turning  the  table  vertically,  we  const.ained  the 
camera  motion  lo  the  horizontal  and  vertical  directions.  The  three 
camera  positions  formed  an  equilateral  triangle,  but  because  the 
camera  was  held  m  a  slightly  rotated  poi-bon  in  the  mounting,  the 
camera  model  had  to  be  determined  by  hand  matching  25  points 
in  the  three  images.  Instead  af  rectifying  the  images,  the 
binocular  algorithm  was  mortified  to  allow  tho  epipolar  line 
orientation  to  vary  with  pixel  position.  Unfortunately,  due  to  the 
crudeness  ol  the  mounting  arrangement,  the  model  showed  as 
much  as  two  pixel  deviation  of  edge  points  Irom  its  corresponding 
epipolar  imes  The  algor  thm  was  modified  to  search  in  this  range 
Figuies  -  10.  5-11  and  5  12  show  [he  lirst  set  of  images,  and 
liyu.es  5- 13.  5- 1 A  and  5- 15  show  the  edge  points  The  second  set 
of  teal  images  were  generated  using  a  tripod  and  marks  on  the 
floor  The  camera  model  lor  this  second  set  much  more  closely 
matched  the  ideal  equilateral  case.  Figures  5-16,  5  17  and  5-10 
snow  tho  first  set  of  images,  and  figures  5-19,  5  20  and  5  21  snow 
ihe  edge  rrmts. 

The  images  were  256  by  240  pixels  (reduced  by  averaging  from 
the  5' 2  oy  480  camera  output)  and  Ihe  disparity  range  was  20  to 
r"l  pixels  in  Ihe  first  image  and  10  to  55  pixels  in  tho  second  imaqo. 
Tho  algorithm  v/as  restricted  to  search  only  in  this  disparity  range. 
The  photometric  failure  threshold  was  used. 

The  results  were  good  lor  both  scenes  as  shewn  in  ligures  5-22 
and  5  23  In  the  lirst  scene,  the  matching  was  better  in  the  center 
of  the  linage  where  the  camera  model  was  more  accurate  In  the 
.  .send  scene,  tbe  ma'ch  ng  was  good  overall. 


[ _ a 


IT  0 


C 


Figure  5-14:  rirjtRetl:  Right  Edges 


6.  Conclusion 

The  trinocular  matching  algo  ihm  performs  very  well  even 
though  it  uses  the  same  orde  of  computing  resources  as  the 
binocular  method.  And  fce.ause  of  the  additional  constraints 
provided  by  the  third  image,  the  trinocular  algorithm  uoes  not 
depend  on  any  continuity  assumptions  which  are  usually 
necessary  to  make  a  binocuia-  matching  scheme  work.  The 
tonncular  method  can  also  match  horizontal  and  near  horizontal 
edges,  which  the  binocular  scheme  cannot 

Qoth  the  binocular  and  trinocular  me*hods  depend  rather 
strorgly  on  an  accurate  camera  model.  *n  inaccurate  camera 
model  forces  the  algorithm  to  scan  a  stop  around  the  epipolar 
constraints  The  resulting  increase  m  the  number  of  geometrically 
matched  triples  to  be  evaluated  and  compared  degrades  the 
running  time  of  the  algorithm  and  increases  the  number  o!  lalse 
matches  and  missed  matcnes.  The  trinocular  algorithm  coped 
with  a  two  pixel  error  in  the  case  of  the  vision  lab  image  but  at 
some  cost  m  performance. 

The  algorithm  presented  here  s  a  true  tnnocjlar  algorithm.  It 
considers  all  three  images  at  the  same  time,  not  as  three  binocular 
pairs  As  a  resull.  the  number  of  occluded  contour  points  is 
greater  than  for  any  of  the  binocular  pairs,  because  m  order  to  be 
matched,  a  ccntdur  point  must  be  visit'e  Irom  all  thrra  cameras.  If 
the  algorithm  were  extended  to  use  higher  level  assumptions,  it 
could  be  made  to  match  points  visible  in  only  two  out  of  three 
images  For  example  the  system  couid  perform  three  dimension 
curve  tracing  in  order  to  reconstruct  the  three  dimensional 
contours  cn  the  object.  The  mcrecsed  understanding  o*  the 
object  structure  could  enable  the  system  to  eliminate  "single  bit" 
errors  More  important,  once  this  step  is  taken  lor  thu  contour 
poults  visible  trom  all  three  images,  the  contours  could  be 
ex’ended  based  on  the  information  available  in  two  ol  the  images. 
Hence,  even  il  some  parts  of  a  contour  were  not  visible  in  one 
image,  the  contour  points  Could  still  be  matched  rei  abiy.  In  ttvs 
case  the  trinocular  method  would  match  more  points  than  the 
b  tuCular  method  because  more  points  are  vis-ble  Ircm  two  out 
throe  cameras  than  Irom  two  out  ol  two. 


References 

" .  John  Aloimonos.  Amit  Bandyopadhyay,  and  Paul  Chou.  On  the 
Foundations  ol  Trinocular  Machine  Vision.  University  of 
Roche  ster  Departmer,;  Computer  Science.  Rochester,  NY 
14627,  May,  1985. 

2.  R  D.  Arnold  and  I  O.  Bmford.  Geometnc  Constraints  in 
Stereo  Vision  Computer  Science  Department,  Stanford 
University,  1900. 

3.  S  T  Barnard  and  W  Q  Thompson.  "Disparity  Analysis  of 
Images".  IEEE  Transactions  on  Pattern  Analysis  and  Mach  ne 
Intelligence  2  (1980),  334  340. 

4.  Mmtvu  llo  and  Akira  IshH  Three- View  Stereo  Analysis. 
Musashmo  Electrical  Communication  Laboratory,  Nippon 
Telegraph  and  Telephone  Public  Corporation,  3-9-11,  Midori -cho, 
Musashmo  shi.  Tokyo.  180  Japan,  1985, 

5.  J  Y  S  Luh  and  John  A.  Kiaasen.  "A  Three-Dimensional 
Vision  by  Ofl  Shell  System  with  Multi -Cameras".  IEEE 
Transactions  on  Pattern  Analysis  and  Machine  Intelligence  7,  1 
(January  1985),  35  45. 

p.  Y  Ohta  and  T.  Kanade  Stereo  by  Two-Level  Dynamic 
Programming.  Proceedings  ol  the  Ninth  IJCAI,  1965,  pp.  1120. 

7.  Roger  V.  Tsai.  Multiframe  Image  Point  Matching  and  3-0 
Surface  Reconstruction.  Research  Report  RC  0398  ( #  41469), 

IBM  T  j.  Waiuon  Research  Center.  Yorktown  Heights.  NY  .3596, 
May,  1982. 

8.  Masahiko  Yachtda.  3-D  Dal*  Acquisition  by  Multiple  Views. 
Third  Intern  itional  Symposium  on  Robotics  Research,  Paris, 
France,  October,  1965. 


175 


GCQ70/y 5  25 


Edge!- Aggregation  and  Edge-Description 


Vbhvjit  S.  Nalwa 

A. I.  Lib.,  Stanford  Vniversitg,  CA  3 4 SOS 
Eric  Pmuchon* 

E.  T.C.A.,  94114  A rcc*t l  Ceitz,  Fnnct 


A.  ***** 

Am  eift  is  «s  i*  -vc  corresponds  to  t  ditto s- 
linuilg  is  (Ac  intensity  ra rjttt  of  (Ac  underlying  cccsc. 
/(  css  Ac  spprozimoled  Ay  «  piecewise  s'.raighl  eurve 
composed  of  edgels.  i  t.  short,  linttr  edge-elements, 
tar  A  characterised  Ay  «  dtrteiio  s  tnd  t  poiilion.  in  a 
previous  ptptr  [Salwa'84j  we  described  a  strategy  to 
dtltcl  edgels.  Edgels.  bp  themselves,  art  of  UP'e  u it 
is  ricios  systems.  In  tkii  paper  re  proceed  to  discuss 
algorithm t  to  iggrtgolt  tdgtlt  into  edges  and  to 
dt»eribt  Ikttt  edg.t  Ay  br.st-fit  cure*. 

7Ae  edgel-linUny  algorithm  is  temple  and  hat  a 
local  character.  It  relit »  only  on  tdftl  prozimit y  tnd 
direction,  ike  bait  at  d  for  edgt-dtseripUr n  coalite 
of  conic-sections.  Pot.  lion  tnd  tangent  continuity  ate 
maintained  in  Ike  cnrvt- filling  stage.  Tkt  problems 
addressed  include.  Ike  discover y  of  straight  lines  tnd 
their  discrimination  from  low- curvature  segments.  Ike 
detection  of  corners,  the  ekotet  of  knoU  and  Ike  esti- 
motion  of  the  distance  of  an  tdgtl  "from  a  conic- 
section.  lie  demonstrate  car  algorithms  with  a 
detailed  example. 


1.  Introdnctioa 

It  1*  bard  to  over-eruphasize  tbe  importance  of 
cdge-detec  .on  io  image  understanding.  Most  modules 
in  a  conceivable  vision  system  depend,  directly  or 
indirectly,  on  tbe  performance  of  *he  edgc-detector. 
Consequently,  there  baa  been  a  substantial  effort  in 
this  direction.  Despite  this  effort,  many  in  the  com¬ 
munity  believe  that  the  problem  is  largely  unsolved. 
In  fa»t,  it  may  be  claimed  with  some  justification, 
that  research  and  motivation  on  other  fro-'s  (eg. 
stereo  and  line-drawing  interpretation)  has  I  dam¬ 
pened  by  the  ineffectiveness  of  existing  detectors 

An  edge  in  an  image  corresponds  to  a  discon¬ 
tinuity  in  the  intensity  suifare  of  the  underlying 
scene.  It  can  be  approximated  by  a  piecewise  straight 
curse  composed  of  edge|s,  ie  short,  linear  odge- 


Ttu»  »<>rk  su  supported  ta  pul  by  tbe  Defense  Adsaaced 
Iteseurb  Projects  Agearjr  under  cc.irarl  NU00.1M4-C-fl.’l  1 
*K  P  •  as  a  si-mas  researeb  sr  test  mi  at  tbe  Slaaford  A  I  Lab. 
during  tbe  PJR4  85  acaJemie  year. 


elements,  each  characterized  by  n  direction  and  a 
position.  In  a  previous  paper  |N»lwa'84|  we  described 
a  strategy  to  detect  edgels  with  step-profiles,  which  by 
far  are  the  most  common  type.  Edgels,  by  them¬ 
selves,  are  of  little  use  in  vision  systems.  In  this  paper 
we  proceed  to  discuss  algorithms  to  aggregate  edgels 
into  edges  and  to  describe  these  edges  by  best-fit 
curves. 

Tbe  problem  addressed  ib  this  paper  is  posed  as 
the  following.  G>v«n,  a  list  of  edgels  belonging  to  an 
image,  where  each  edge*  is  characterized  by  its  orien¬ 
tation.  the  location  of  its  center,  the  intens  ties  on  its 
two  sides  and  tbe  window  in  which  it  v.**  detected. 
First,  aggregate  tbe  edgeb  into  ordered  sets 
corresponding  to  individual  extended  edges.  Then, 
describe  these  edges  by  fitting  curves  to  their  edgel- 
members  in  some  best- fit  sense.  Position  and  tangent 
continuity  are  to  be  preserved  in  th«  curve-fitting 
stage.  Also,  the  unique  roie  played  by  straight  lines  in 
subsequent  vbion-system-modulcs  demands  that  we 
distinguish  them  from  low-curvature  segment*. 

Compared  to  tbe  vast  literature  on  edge- 
detection.  tbe  work  on  edget-linhing  b  >. -eager.  A 
review  of  prevv>ns  work  is  to  be  found  in  [Ballard  £ 
Broivn'82|.  Our  approach  to  edgei-linking  has  a 
local-character  and  employs  few  heuristics  or  thres¬ 
holds.  In  fact,  it  is  simple  enough  for  u>  'o  believe 
that  it  is  likely  to  have  been  implemented  previously, 
al' hough  we  have  been  unable  to  fin.;  a>,  v  published 
source.  Our  algorithm  is  a  three  step  Step  I: 

map  the  edgeb  onto  a  grid.  Tbe  spacing  ■(  the  grid  is 
determined  by  the  quality  of  the  cdgehd-'teotor.  Wo 
use  a  square  grid  with  half-pixel  spacing  Step  2:  thin 
the  edgeb  mapped  onto  the  grid  so  as  to  obtaix 
minimum  connectivity,  in  the  8-neighbour  scase, 
between  the  edgeh centers.  This  thinning  stage  must 
be  distinguished  from  the  common  thinning  procedure 
used  on  the  output  of  many  edge  detectors.  The  aim 
of  Steps  1  and  2  is  rot  to  localize  the  edge,  but  only 
to  avoid  all  search  in  Step  3  by  forming  a  minimally 
connected  graph  between  tbe  edgcLs.  Step  3:  starting 
with  a  ur.gmuped  edge),  extend  the  edgei-set  in  both 
directions  by  following  the  ronnectivity-granh 
obtained  from  Step  2.  Contour-following  rather  than 
search  is  involved  ,n  this  stage.  Decisions  about  the 
choice  of  the  next  member  of  the  the  current  edgel-set 
are  to  be  made  only  *t  junctions  and  are  based  on 
local  orientation  compatibility.  The  details  of  this 
algorithm  are  disrtiss.-d  in  Section  II. 


176 


J.'«f 55*  .LW 


.-'.  -V  -%-V-V. 


i 

r 


►  •. 
>;-. 
■?'. 


Similar  to  the  approach  in  [Turn»r’74j,  we  use 
the  plot  of  tangent  vs  arc-length  io  segment  our  edges 
into  ronir-seriions  and  straight  lines.  The  local  orien¬ 
tation  along  the  curve  is  directly  available  to  us  from 
the  edgel  parameters  and  the  arc-length  is  estimated 
by  obtaining  a  polygonal  approximation  to  the  edge. 
The  intrinsic  representation  is  used  to  discover 
straight  lines  and  to  choose  candida'e  knots  from 
among  tb»  edgels.  It  is  abo  used  to  detect  corners. 
We  use  conic-sections  to  describe  non-straight  edge 
segments.  Although  conics  has  been  often  used  to  fit 
approximating  curves  to  data  jsee  Pavlidis’83|.  unlike 
most  previous  attempts  we  also  deal  with  the  problem 
of  position  and  tangent  continuity  between  adjacent 
segments.  When  one  chooses  to  fit  a  curve  based  on 
an  <rro;  criterion  dependent  on  the  distance  of  the 
data  from  the  curve,  an  important  accompanying  con¬ 
cern  is  the  estimation  of  this  distance.  It  is  often  a 
non-trivial  problem  and  crude  approximations  show 
up  in  the  fitting-curves  having  systematic  deviations 
from  the  optimum  fit.  We  formulate  a  distance  meas¬ 
ure.  for  a  point  from  a  conic,  which  overcomes  some 
of  the  problems  associated  with  previous  formulations. 
We  a  bo  indicate  ho’.v  one  could  ootain  an  exact  solu¬ 
tion  for  this  distance  and  why  it  is  impractical  to  use 
it.  The  details  of  the  curve-fitting  approach  are  out¬ 
lined  in  Section  III. 

\V«>  should  mention  that  it  s  not  our  intention  to 
give  detailed  algorithms  in  this  paper.  Only  broad 
outlines  are  presented  and  some  of  the  issues  we  have 
concerned  ourselves  with,  are  discussed.  The  details 
of  the  specific  implementation  are  numerous  and  may 
vary. 

In  Section  IV,  we  pres»nt  and  discuss  a  detailed 
example  and  finally  in  Section  V  we  conclude  with  an 
outline  of  sonic  of  the  important  Issues  in  linking  and 
curvi-li'iing  which  need  further  research. 


I*.  Linking 

As  mentioned  in  the  introduction,  our  'inking 
algorithm  is  local-based  and  >3  .■'ssectially  a  three-step 
process  Our  chief  concern  in  its  design  was  simplicity 
and  effectiveness  without  unnecessary  heuristics. 

In  the  firs*  step,  the  edgels  are  mapped  onto  a 
square  grid  which  we  call  the  connectivity-grid  for 
reasons  which  will  soon  become  clear.  This  grii  is 
conceived  to  be  a  2-D  array  of  cells.  Each  cell  con¬ 
tains  a  flag  and  a  pointer.  The  corresponding  <lata- 
st run  .ire  L.  sketched  in  f'ig.  I  Thp  grid-sparing 
should  lie  chosen  on  the  basis  of  the  quality  of  the 
edgel  input.  The  spacing  should  be  such,  that  th? 
resolution  of  the  edgel- detector  Is  maintained  and 
ed';i-!s  belonging  to  (he  same  edge  not  have  gaps 
between  tluur  mappings.  \\>  chose  a  half-pixel  spac¬ 
ing  for  edgel-data  obtair.e  I  from  the  application  of  the 
Nala a  Operator  [Nalwa'Slj  to  an  image  Some  of  the 
mask s  used  for  the  mapping  are  shown  in  f  ig.  2.  The 
appropriate  portion  of  the  corresponding  mask  is 
copied  onto  the  grid.  The  Hag  of  a  cell  indicates 
»  bethel  an  edgel-center.  an  t  dgel-ex’- Ilsion  in  the 
.b  looting-window  or  none  of  the  above  is  mapped 


Co,tncctivity- 


OriciMjtioa 
Posit  io* 

I  HI  cavities 


Hugcl-Dat-*  NODE 


Fig.  1.  Connectivity-Grid. 


IMIHMIlltlMIHHIOlIln 
t  ii  ft  0  »  II  M  *)  *  I  A  I  I  0  0  4  I  « 

MMKMMlMlMttlt 
I  ii  ft  n  (  ft  ft  s  i  ft  *  #|  M  I  ft  1  | 
Mlftftftllftftftlft  ftftftft  SSI 

ftCIAftAMHIIlllitM 

OftlliHftMMklllMftO 

ft  I  1  ft  ft  I  ft  I  I  I  I  I  I  I  I  ft  •  k  I 

OftftftMItftMMftftMII 

OtftfttfttlllMSftftlOM 

IlfttlftOtlftttilOtlt  jftft 

«•<»*•  nn*O«0*0*a0O0« 

ft  (I  I  ,1  M  ft  k  ft  )  M  ft  ft  ft  O  0  ft  9 

IIMtlfttlllttttMM 

Mask  for  Edgel  at  (I  deg. 


MIHIftllllMOIftMl 

ft  H*  Mft  ft  J  HO  ftftft  Cftft  l 
ftftftft  AfttftnftftOOftMftl  | 

IlftftAM  AMDIftftOI  |  It 

sAftAftiAiftllt  II  I  III 
IftlMlilliOIII  I  10  0*0 
llllllillAMII  I  •  0  •  0  • 
IlftftlftOftMft.  illCf  II 
I  0  ft  I  0  ft  ”  ft  ft  I  |  I  ft  I  I  I  I  I  I 

•  ft  :iii)i.i  i  l  iiiMftifti 

•  10  0  0  10  I  11119001110 

•  •£••01  IftfttftftlCOOM 

0  0  •  •  •  I  IMftiiftOIIIIK 

•0001 lAftftfttlftOlftftit 
00  I  I  l  ftftftOlftlOlllOlt 

•  l  iKt'ftftOSOOIIOOII 
ft  llAOOftMOIIIAIOftll 
I  tftAsftMMMfttftftftftO 


Mask  for  f  jecl  at  40  deg. 


Fig.  2.  Sample  masks  used  to  map  edgels  onto 
the  connectivity-grid. 


onto  the  cell,  /kit  edgel-center  flag  supersedes  an 
ed  gel-ex  tension  flag.  The  pointer  in  each  cell  with  an 
edgel-cent -*r  flag  points  to  a  node  containing  the 
corresponding  edgel- parameters,  i.e.  position,  orienta¬ 
tion  and  intensities  If  more  than  one  edgel-center  is 
mapped  onto  a  cell  then  the  various  parameters  are 
averaged.  We  must  emphasize  that  the  connectivity 
grid  is  used  only  to  obtain  connectivity  information 
for  the  edgels. 

As  some  of  the  edgels  will  inevitably  be  tangent 
to  high-eurv.aturt  edges,  thi  ir  extremes  will  deviate 
from  the  underlying  edge  and  may  result  in  false  con- 
needvity  during  the  first  step  Hence,  it  may  be  desir¬ 
able  to  disregard  tie  non-overlapping  extremes  of 
edgels. 

In  the  second  step,  we  thin  the  edgel  mappings 
onto  the  connectivity-grid  in  order  to  obtain  a 
minimally  connected  (in  the  ^-neighbour  spnve)  graph, 
for  this  purpose,  one  can  use  any  of  the  standard 
thinning  algorithms  |see  l‘avlidts>2|  with  minor 
modifications  W»  use  what  I’avlidis  IPavlidis’H'il 
calls  the  Classical  Thinning  Algorithm,  simply  because 
it  was  the  easiest  to  implement.  We  first  thin  out  all 
the  nnn-skeletal  edgel-extenstou  cells.  Then,  we  thin 
the  grid  so  as  to  obtain  a  minimally  connected  graph 
lin  the  H-neighbour  sense).  During  this  process  care  is 


177 


taken  not  to  discard  any  ed  gel-center  cell  without 
doing  c  -e  jf  the  Following.  If  it  neighbours  a  skeletal 
edgel-center  cell,  the  parameter.,  are  averaged 
appropriately,  else  a  pointer  from  a  neighbouring 
skeletal  eagel-extension  cell  is  established  to  its  data* 
node.  Essentially,  we  do  not  want  to  carelessly  dis¬ 
card  any  information.  By  reducing  the  connectivity 
grid  to  a  minima, ’I)  connected  graph,  we  ensure  that 
when  we  group  edgels  by  contour  following  in  step 
three,  we  will  obtain  a  nearly  completely  ordered  set. 
i.e.  the  edgels  will  be  ordered  on  the  basis  of  their 
position  along  the  edge.  This  will  be  important  for 
our  curve-fitting  strategy. 

In  the  third  step  we  group  the  edgels  into  ordered 
sets,  each  corresponding  to  an  edge.  This  stage  sim¬ 
ply  involves  starting  out  with  an  unclassified  edgel- 
center  cell  and  extending  the  edgel-set  by  following 
the  minimally  connected  graph  in  both  directions. 
The  starting  edgel-center  is  chosen  to  be  at  most  2- 
eomected  in  order  to  avoid  the  complications  of  start¬ 
ing  out  at  a  junction.  Decisions  about  the  choice  of 
the  next  member  of  the  set  arise  only  at  '•ells  vith 
connectivity  greater  than  2,  i.e.  at  junctions.  T. lese 
decisions  are  based  on  the  compatibility  of  the  or  en- 
lation  of  the  last  edgebmember  of  the  current  sc;  and 
that  of  the  candidate  edgels. 

It  is  often  feasible  to  extend  edges  and  connect 
adjacent  edge-terminations  if  one  lowers  the  threshold 
on  the  edgel-contrast.  One  way  to  incorporate  this 
capability  into  the  above  algorithm  is  to  obtain  the 
connectivity  graph  for  all  edgels  with  a  contrast 
greater  than  the  lower  threshold  and  then,  in  step 
three,  to  discard  all  edgel-sets  which  d  j  not  have  any 
edgel  with  contrast  greater  than  the  higher  threshold. 
However,  we  must  be  careful  about  random  edge- 
extensions  which  do  not  have  orientation- 
compatibility  with  the  mr  in-edge. 

At  the  conclusion  of  the  linking  stage,  we  have 
ordered  lists  oi  edgels  belonging  to  individual 
extended  edges.  June'  ion  information  is  also  main¬ 
tained.  i.e.  information  about  edgeHist  terminations 
at  junctions  and  edgel-list  intersections  is  preserved. 
\*"e  remind  the  reader  that  the  relevant  parameters, 
i.e.  orientation,  position  and  intensities,  for  all  edgels 
remain  accessible. 


ID.  Curve-Fitting 

.Some  of  the  issues  encountered  when  attempting 
to  lit  curves  to  data,  are  the  following.  What  should 
be  the  order  of  continuity  between  adjacent  curve  seg¬ 
ments?  It  is  well  known  that  humans  are  sensitive  to 
both,  position  and  tangent  discontinuities.  Then,  we 
must  choose  the  family  of  curves  we  intend  to  fit. 
This  choice  must  be  compatible  with  our  continuity 
requirements  and  must  also  be  amenable  to  reasonable 
error-criteria  humiliations.  Next,  we  must  decitl*  on 
how  to  oh<s ;se  knotv  given  our  rhoire  of  curves  We 
may  also  wan!  to  discover  instances  of  certain  feature* 
(e  g  straight  lines,  corners)  which  play  an  important 
role  in  subs,  (pient  processing. 


A 


FI*.  3.  Guided  Conic. 

Because  humans  are  sensitive  to  C,  discontinui¬ 
ties  in  curves  and,  in  a  sense,  our  confidence  about  the 
feasibility  of  machine  vision  is  based  on  their  perfor¬ 
mance,  we  believe  that  taege*'  continuity  is  a  neces¬ 
sary  requirement  for  fitting  curres  to  ed  gel-data.  As 
regards  curvature  discontinuities,  we  consider  the 
empirical  evidence  in  support  of  human  sensitivity  to 
them  to  be  insufficient.  Having  decided  on  the  order 
of  continuity  desired,  we  must  new  choose  a  family  of 
curves.  We  decided  to  use  conics  for  a  variety  of  rea¬ 
sons.  Firstly,  they  can  satisfy  our  continuity  require¬ 
ments.  Secondly,  unlike  with  functions  of  an  indepen- 
drut  variable,  we  do  no!  have  to  deal  with  awkward 
segmentation  at  points  with  infinite  slope.  Farther, 
the  family  of  conics  have  been  studied  as  far  back  as 
Appollonius  and  their  properties  are  well  documented. 
Also,  unlike  higher  order  curves  they  do  not  have 
inflection  points.  The  implications  are  that  we  must 
introduce  knots  at  inflection  points,  but  more  impor¬ 
tantly  that  if  the  knots  are  well-chosen  then  unneces¬ 
sary  wiggles  an-  avoided. 

An  excrll'-nt  introduction  to  conics  is  given  in 
[Pavlidis'83|.  It  indicate*  bow  one  may  proceed  to  fit 
conic  spline*  to  data  without  solving  explicitly  for  the 
six  parameters  in  the  algebraic  representation 
«.rI+  6.r>  +  e  /t+  d.z  +  e.f  +  J  =  0.  We  repeat 
the  formulation  here.  Let  A  aud  B  be  two  points  with 
known  tangents  *g  +  «,J  +  «2jr  ~  0  and 
&0  +  +  kjjt  =0  respectively,  as  shown  in  Fig. 

A  Further,  let  the  line  connecting  A  and  B  have  the 
equation  «„  +  « ,r  +  «  -jr  =  0.  ^-D,  the  family  of 
conics  which  passe*  through  A  and  B  with  their 
specific  tangents  has  one  degree  of  freedom,  say  K .  It 
ran  easily  be  verified  that  K.  («„  +  «  +  «?jf  )*  = 

(a0  +  a,r  +  d.yM»#  +  »,r  +  )  represents  this 

family.  We  have  tne  freedom  to  choose  K  . 

Given  our  choir.*  of  conics  and  the  Cf  continuity 
constraint,  we  have  a  one-parameter  family  of  curves 
for  even  given  set  of  edgels  with  their  end-points  and 
tangents  at  their  end-points  specified.  We  minimize 
the  sqm  of  square-errors  to  determine  A’,  i.e.  minim¬ 
ize  Ed,  :  where  d ,  is  the  distance  of  the  i'*  edgel 
from  the  conic  In  Appendix  I,  we  indicate  how  i 
may  be  estimated.  Our  estimation  technique  is  more 


O 


173 


u>7 V-  y. 


> 

\\ 

\ 

.3 


x 

C-- 

rW 


accurate  than  [>revious  approximations  [eg. 
Turner  74|  and  overcomes  some  of  'heir  drawbacks, 
e.;,  a  tendency  to  produce  “flat"  con,  ^-fits.  W  e  indi¬ 
cate  in  Appendix  II  how  one  may  proceed  to  dad  the 
exact  distance  of  a  point  from  a  conic  and  why  it  is 
“impractical  to  do  so.  It  can  immediately  be  seen  that, 
our  error-criterion  L  invariant  to  equiform  transfor¬ 
mations  of  the  image- plane,  i.e.  our  best-fit  curve 
undergoes  the  same  translation,  rotation  and  change 
in  sca'e  as  the  edgel-c  ,ta.  This,  of  course,  is  a  very 
desirable  feature. 

Now  we  address  the  problem  of  segmentation  cf 
a  set  of  edgels  into  sub-groups,  each  of  which  will  be 
fit  with  a  separate  curve  maintaining  position  ana 
tangent  continuity  at  the  knots.  As  mentioned 
before,  each  edgei  has  its  center-oositicj  and  orientar 
tion  specified.  Also,  the  edgels  in  every  set  are 
ordered  on  the  basis  of  their  position  along  the  con¬ 
tour.  We  use  the  v-»  representation,  i.e.  the  plot  of 
tangent  vs  arc-length,  to  determine  the  knots.  As  the 
orientation  of  every  edgei  is  known,  we  only  need  an 
estimate  for  the  arc-length  <*t  the  edgel-centers  to 
obtain  the  t'  »  plot.  It  is  not  advisable  to  estimate  * 
by  simply  taking  the  cumulative  sum  of  the  inter- 
edgol  distances  because  of  the  “coast-line"  effect,  i.e. 
such  an  estimate  will  over-shoot  the  actual  length 
owing  to  (he  scatter  of  the  data  about  the  underlying 
curve.  We  ire  a  polygonal  approximation  to  the 
edgels,  to  dett.mine  a.  Our  line-fitting  algorithm  is 
adapted  from  a  split-and-merge  algorithm  listed  in 
[Pavlidis'821.  Once  a  polygonal  approximation  to  the 
data  has  been  obtained,  the  distanc  of  an  edgei  from 
the  preceding  vertex  is  used  as  an  estimate  for  the  dis¬ 
tance  between  the  two  points,  along  the  curve.  In 
this  fashion  we  construct  the  *'•-*  plot  for  each  set  of 
edge!;.  The  v-a  curve  ha3  some  iuterestiug  proper¬ 
ties  of  which  we  would  like  to  make  -he  reader  aware. 
Translation  of  a  curve  in  the  imagt^plane  does  not 
affect  its  r  a  plot,  rotation  corresponds  to  a  shift  in 
the  i-a\is  by  the  angle  of  rotation  ana  change  of  scale 
corresponds  to  a  proportional  stretching  or  sir  iking 
of  the  a -axis  Closed  contours  ta  th“  image  plane 
have  periodic  v  a  plots.  Further,  the  slope  of  the 
x  plot,  i.e.  di'/da,  gives  the  curvature  of  the 
corresponding  point  on  the  curve  in  the  image-plane. 
It  follows.  *hat  straight  lines  in  the  image-plane  mani¬ 
fest  themselves  as  zero-slope  segments  in  the  V  * 
plane,  inflection  points  map  onto  extrema,  and  circles 
onto  constant-slope  segments. 

Having  obtained  the  v  a  plot  •  e  first  seek  out 
straight  liio.-s  in  the  image.  We  begin  by  thresholding 
on  the  curvature  and  length  of  segments  oi  the 
polygonal-fit  in  the  image-plane  to  obtain  andidate 
straight-lines.  The  curvature  is  estimated  fiom  the 
slope  of  the  best-lit  (in  the  least-squares  sense)  linear 
approximation  in  ‘he  us  plane.  Ideally,  straight 
line-  must  have  zero  curvature.  Our  threshold  on 
maximum  curvature  was  l*  /pixel.  This  corresponds 
;o  a  circle  of  radius  GO  pixels  approximately.  The 
purpose  of  the  threshold  on  minimum  length  is  to 
avoid  fitting  straight  lines  to  low  curvature  segments 
of  larger  curves  with  varying  curvature,  eg.  portions 
of  Hat"  ellipses.  Thresholding  on  curvature  obvi- 


c 


fig,  4.  Guided  conic  fitted  to  data  with  an 
internal  inflection  point. 


ouslv  docs  not  distinguish  between  straight  lines  and 
curves  with  curvature  less  than  the  threshold,  even 
though  the  orientation  at  the  two  end-points  of  the 
segment  may  be  distinctly  different.  Hence,  we  also 
threshold  on  the  difference  in  the  orientation  estimates 
(obtained  from  the  linear-fit  in  the  4>-a  plane)  at  the 
end-points.  We  also  check  if  the  consiant-orientation 
hypothesis  is  satisfied  in  the  Diane.  Adjacent 

straight-line  candidates  must  either  merge  into  a  sin¬ 
gle  straight  line  cr  they  must  have  a  tangent  discon¬ 
tinuity  between  them.  Tangent  discontinuities  are 
checked  by  comparing  tne  orientation  est, mates  for 
the  common  datum  from  the  two  adjacent  linear-fits 
in  the  v  a  plane.  Merging  is  successful  if  the  curva¬ 
ture  remains  within  bounds,  the  total  change  in  orien¬ 
tation  is  within  limits,  the  constant-orientation 
hypothesis  is  valid  in  the  k-i  plane  and  a  linear-fit  in 
the  image  plane  is  successful. 

Having  found  straight-lines,  we  now  turn  our 
attert:un  to  inflection  points.  Inflection  points  on  a 
curve  are  points  a*  which  the  curvature  changes  sign. 
Therefore,  they  appear  as  extrema  in  the  t>-»  plot.  It 
H  our  speculation  that  they  play  an  important  role  in 
human  vision,  l.'nlike  extrema  in  curvature,  they  are 
preserved  under  perspective  projection.  This  i;  easily 
verified  by  noting  that  straight  lines  remain  straight 
under  perspective  projection.  We  find  extrema  in  the 
v>  *  plane  by  using  the  hysteresis  smoothing 
mechanism  [Duda  k  Hart'73].  It  is  a  non-linear  pro¬ 
cess  for  finding  “significant”  extrema. 

Now  we  address  the  issue  of  knot  placement.  We 
begin  by  placing  knots  at  the  ends  of  straight  lines 
and  at  inflection  points.  As  mentioned  previously 
inflection  points  cannot  be  represented  by  conic  sec¬ 
tions.  If  wc  try  to  fit  a.  conic  to  data  w.th  an  internal 
inflection  point,  the  situation  shown  in  Fig.  4  will 
rf-,u|t.  In  fact,  we  must  explicitly  check  for  this 
pathological  case  before  we  attempt  conic-fitting 
because,  inevitably,  some  inhection  points  would  have 
escaped  detection.  To  position  the  other  knots  we 


v 


k; 
i  - 


L 


if 

- 


179 


obtain  polygonal-fits  10  portions  of  the  v>-3  plot 
between  the  knots  already  chosen.  The  algorithm 
used  is  once  again  of  the  split  and  merge  variety  from 
IPa.viidis'82] .  The  corners  of  the  polygonal-fit  are 
used  as  an  initial  choice  for  the  remaining  knots.  The 
segments  between  t'lese  knots  correspond  to  data 
which  can  appioximatelv  be  described  by  a  circie,  i.e. 
a  constant-curvature  curve,  in  tpe  image-plane. 
Henc",  we  expect  the  family  of  conics  to  be  sufficient 
to  describe  them.  We  have  aLo,  in  an  indirect 
fashion,  achieved  orientation  compatibility  between 
the  edgel-data  and  the  fitted  emit  in  the  image  plane. 
The  final  number  and  placement  of  the  knots,  chosen 
from  the  polygonal-fit  in  the  4>-i  plane,  is  determined 
by  a  split  and  merge  algorithm  which  keeps  the  least 
mean-squares-  uror  and  the  maximum  unsigned  error 
of  the  best-fit  conic,  in  the  image-plane,  within 
bounds.  As  mentioned  in  Appendix  f,  our  error- 
distance  estimates  are  never  smaller  than  the  actual 
error.  Hence,  the  error  bounds  are  strictly  satisfied, 
although  this  may  be  at  the  expense  of  extra  knots. 
Systematic  errors,  due  to  the  insufficiency  of  the  basis, 
can  b»  discouraged  by  modulating  the  error  thresholds 
to  favor  crossings  of  the  fitted  curve  and  the  underly¬ 
ing  data,  i.e.  sign  changes  in  the  error-distances  ol 
ordered  edgel-data  are  favored. 

An  often  mentioned  example  in  support  of  knot 
placement  at  curvature  extrema  is  Attneave’s  cat 
[Attneave’51|.  We  bdieve,  that  the  choice  of  knots 
ii,jst  necessarily  be  guided  by  the  choice  of  the  family 
of  curves  one  uses  to  represent  the  data.  The  fact 
that  Aitneave  found  that  a  cat’s  outline  was  still 
easily  recognizable  when  it  was  represented  by 
straight  lines  joining  curvature-maxima  only  indicates 
that  knot  placement  at  curvaturc-extrema  is  a  good 
idea  if  one  is  using  a  polygonal  approximation.  This 
is  fairly  intuitive  if  one  consider,  the  Tavlor-Serics 
expansion  of  a  curve  in  the  *'*-!  plane.  A-  mentioned 
before,  a  stiaight  line  in  the  image-plane  corresponds 
to  a  constant  in  the  i>- 1  plane.  Therefore,  if  we 
represent  a  curve  in  tbe  Image-plane  by  straigbt-line 
segments,  it  corresponds  to  r  -presentation  by  coustant 
segments  in  the  %>-•  plane  From  the  Taylor-Series 
expansion  for  a  fur.  it",)  .  >ut  :.i  point,  it  follows  that 
the  error  term  for  u  function  approximated  by  a  con¬ 
stant  depends  on  its  first-derivative  in  the  interval  of 
appro*i -nation.  Hence,  we  would  like  to  locate  the 
first-derivative  ext.v~ia  near  the  bounds  of  the 
approximating  interval.  But  these  extrema  in  the 
V-*  plane  are  precisely  ’.he  points  of  maximum  curva¬ 
ture  in  the  image-plant.  By  the  same  argument,  it 
seems  reasonable  '...  -.cate  the  knots  at  maxima  o.’ 
d’l'/ds5  for  approximatka  b;  circular  arcs  in  the 
image-nlane.  This  stream  of  argument  supports  our 
strategy  of  choosing  knots. 

Besides  discovering  straight-lines  and  inflection- 
points.  one  may  also  want  to  detect  tangent- 
discontinuities  because  of  their  special  role  in  human 
vision.  In  all  its  generality,  this  is  a  hard  problem. 
The  finite  size  of  the  edgcl-detection  operator  causes 
the  failure  of  the  straight-edge-segment  hypothesis 
used  by  edge-detectors,  near  corners.  Consequently, 
the  orientations  of  edgels  detected  near,  corners  r.re 


“blurred’’  versions  of  the  actual  orientation.  T  bs 
“blur,"  is  over  and  above  the  relatively  small  aitio.  ;t 
of  “blur”  introduced  by  the  imaging  system.  F>i-.a*.s.' 
of  the  “smoothing”  of  comers,  there  is  no  way  t.  t'-j 
tinguisb  between  them  and  edge-segments  wb.u  c,.r ■ 
ture  comparable  to  that  of  ‘smoothed’’  corners 
without  going  back  to  the  image.  The  problem 
becomes  more  acute  as  the  size  of  the  edge  operator  is 
increased.  Keeping  the  above  observations  in  view, 
we  believe  that  an  advisable  strategy  is  to  obtain  can¬ 
didate  corners  from  the  edgel  data  and  then  check  the 
hypothesis  in  the  image.  We  have  not  implemented 
the  suggested  approach.  Instead,  we  have  demised  a 
simple  procedure  to  detect  high- curvature  segments 
whose  lengths  are  explicable  as  responses,  for  the 
given  edge-detector,  to  corners,  i.e.  if  the  length  of  a 
high-curvature  segment  is  comparable  to  the  edge- 
operaicr-width,  then  it  is  a  candidate  corner.  High 
curvature  segments  can  be  obtained  by  thresholding 
on  the  slope  of  the  polygonal-nt  segments  in  tbe  i>-» 
plane.  To  avoid  responding  to  noise,  a  lower  bound  is 
placed  on  tbe  totrJ  angle  change  between  the  ends  of 
candidates  segments.  Tbe  declaration  of  corner  at 
segments  of  curves  with  smoothly  varying  curvature, 
e.g.  ends  of  "flat”  ellipses,  is  discouraged  by  insisting 
on  a  significant  curvature  change  between  the  candi¬ 
date  corner-segment  and  the  adjoining  vegiocs. 
From  the  arguments  above,  it  is  dear  that  the  edgel- 
data  at  corners  is  unreliable.  Heuee,  as  we  no  not  go 
back  to  the  image  for  further  information,  corners  are 
localized  by  extrapolating  the  tangents  at  the  ends  of 
the  corner  segments.  It  is  worthwhile  tc  notice  that 
tangent-discontinuity  detection  in  the  image-plane  is 
equivalent  to  step-detection  in  the  i>~t  plane.  Similar 
to  step-edge  detection  in  the  image,  it  is  easier  in  tbe 
i>-»  plane  to  detect  steps  which  are  dal  on  their  two 
sides  than  those  which  have  li-rg*  elopes,  i.e.  corners 
between  straight  lines  are  easier  to  detect  than  those 
between  curves. 

As  the  reader  has  probably  noticed,  we  have  been 
'airly  sketchy  in  this  se-tion  ard  the  previous  one. 
The  reasons  are  two-fold.  First,  we  were  apprehensive 
about  getting  the  reader  bogged  down  in  details 
without  conveying  to  him  the  central  ideas  of  the 
approach.  Some  of  ’.be  issues  we  have  dealt  with,  e  g. 
linking,  conic-fitting,  knot-placement,  straight-line 
discovery,  corner-detection,  are  involved  enough  to 
have  spawned  separate  papers  or  ‘Mch  one  of  them. 
I*  is  difficult  to  discuss  all  their  naances  in  the  course 
of  a  single  paper.  Secondly,  we  hope  to  illustrate 
many  of  the  concepts  introduced  in  our  discussion 
with  an  example  in  the  next  section. 


IV.  An  Example 

We  now  present  a  detailed  example  illustrating 
the  working  of  our  algorithms.  Fig.  5  a  is  a  (61  x  6l| 
image  of  aii  industrial  part.  F  ig.  5-b  shows  Ihe  edgels 
detected  by  the  Nalwa  Operator  |Nalwa’8lj  mapped 
onto  the  connectivity-grid-  Fi~.  5-c  is  the  resulting 
minimally-connected  g/aph.  We  chose  one  of  ihe 
linked  edgec  to  demonstrate  our  curve-fitting  median- 


Fig.  S-m.  Example  :  Original  Image  (64  x  64). 


Fig.  5-b.  Example  :  Detected  edgels  mapped 
onto  the  connoctivity-gTid  (the  largest  dots  in¬ 
dicate  ed gel-centers  while  the  medium  ones  in¬ 
dicate  edgel-extensions). 

ism.  Fig.  5-J  shows  the  plot  of  the  edgel-centers  on 
the  image-plane.  F'g.  5-e  shows  the  polygonal  approx¬ 
imation  to  these  edgels.  Fig.  5-f  is  the  resulting  v-s 
plot.  The  detected  straight-lines,  inflection-points  and 
tangent-discontinuities  are  narked  on  this  bgure.  The 
initial  choice  of  knots,  obtained  from  a  polygonal- 
approximation  to  the  V>-«  plot,  are  also  shown.  Fig. 
■5-g  is  the  image-plot  corresponding  to  Fig.  5-f. 
Finally.  in  Fig.  5-h  we  show  the  fitted  conic  and  the 
final  set  of  knots.  For  reasons  mentioned  in  the  previ¬ 
ous  section,  the  localization  of  the  corner!  is  based  on 
extrapolation  of  the  tangents  at  the  adjoining  knots 
and  uot  on  the  edgel-data  there.  The  standard- 


Flg.  5-c.  Example  :  Minimally-connected  graph 
obtained  by  thinning  Fig.  5-b  (the  largest  dots 
indicate  edgel-centers  while  the  medium  ones 
indicate  edgel-extensions). 


o  JO  40  60 

Fig.  6-d.  Example  :  Edgel-centers  of  selected 

edge. 


deviation  of  the  error-distaDce  of  the  edgel-centers 
from  the  fitted  curve  is  less  than  0.2  pixels  and  tne 
maximum  unsigned  error  is  less  than  0.5  pixels. 

Fig.  6-a  is  an  image  of  a  blocks-world  scene.  Fig. 
6-b  is  the  corresponding  mapping  of  the  detected 
edgels  onto  the  connectivity-grid  and  Fig  6-c  shows 
the  fitted  curves. 


1S1 


&***„  '  &***■■ 


•-f,  p-q  t  •traight-lin** 
q,  o  s  inf lection- pc int* 
f-<?,  o-p  •  4i •continuities 
fin  tenqent) 


F»g.  6-e.  Example  :  Polygonal-fit  for  data  in 
Fig.  5-d. 


Fig.  6-g.  Example  The  image-plane, 
corresponding  to  Fig.  5-f,  showing  the  initial 
placement  of  knots. 


e-f,  p~q  :  straight- lines 
q.  o  :  inflection-point* 

f-q,  o-p  :  discontinuities 


Fig.  5-f.  Example  :  t-  *  curve  with  (list?-, tee 
estimates  obtained  from  the  polygonal- fit  in 
Fig.  5-e.  The  letters  indirate  the  initial  choice 
of  knots  based  on  a  polvgonal-lit  in  the  ip-s 
plane. 


Fig.  6-h.  Example  :  The  final  knot  placement 
in  the  image-plane.  The  figure  also  shows  the 
fitted  straight-lines  and  guided-conics. 
Guided-conics  are  not  fitted  between  knots 
corresponding  to  adjacent  edgeLs  because  these 
conics  are  underconstrained.  In  the  figure,  such 
knots  are  connected  by  straight-lines. 


%"W- ^-’f 1 j.vfM. m^m  nip- 


T'-""'  —  l.^..||.|  |U^JiPISJILffMI>,|, 

* ^  ' ji/ *£• ''^j,,- '^i. '6 „  Hi 


Fig.  8-a.  Blocks-World  Scene  :  Original  Image 

(253  x  256). 


Fig.  6-b.  Blocks-World  Scene  :  Detected  edgels 
mapped  onto  the  connectivity-grid. 


* 


V.  Conclusion 

In  the  course  of  this  paper,  we  addressed  some  of 
the  issues  which  arise  when  one  attempts  to  aggregate 
edgels  into  extended  edges  and  then  describe  these 
edges  by  r.  family  of  curves.  Our  edgel-linking  algo¬ 
rithm  is  simple  and  has  a  local  character.  It  relies 
only  on  edgel  p.oximity  and  orientation.  Unlike  most 
previous  attempts,  there  are  few  heuristics  or  thres¬ 
holds.  Conics  splines  were  chosen  for  curve-fitting  she 
aggregated  edgel-  lata.  Like  any  other  set  of  curves, 
conics  have  their  accompanying  merits  and  failings.  It 
is  a  non-trivial  problem  to  estimate  the  distance  of  a 
point  from  a  conic.  We  presented  an  approximate 
solution  and  indicated  the  conditior  i  for  its  validity. 
It  was  also  shown  why  the  exact  solution  is  highly 
impractical.  The  error-criteria  used  for  curve-fitting 
was  the  sum-of-squares  of  the  distances  of  the  edgel- 
centers  from  the  conic.  Our  strategy  to  choose  knots 
was  justified  on  the  basis  of  the  Taylor-Series  approxi¬ 
mation  to  the  ip-»  plot  of  the  edgei-data.  The  final 
number  and  placement  of  the  knots  was  based  on  the 
satisfaction  of  tipper  Loucds  on  the  mean-square-error 
and  the  maximum  unsigned  error  for  the  edgel-data  in 
the  image-plane.  Orientation  compatibility  between 
the  edgel-data  and  the  fitted  conics  was  indirectly 
achieved.  The  other  problems  examined  included,  the 
discovery  of  straight  lines  and  the  detection  of  corners 
and  inflection  points.  The  algorithms  were  demon¬ 
strated  with  a  detailed  example. 

It  was  not  our  intention  to  give  detailed  algo¬ 
rithms  in  this  paper.  Only  a  broad  outline  of  our 
approach  was  presented.  The  details  of  the  specific 
implementation  are  many  and  may  vary. 

We  do  not  claim  to  h-.ve  general-purpose  and 
robust  solutions  to  the  linking  and  curve-fitting  prob¬ 
lems.  Instead,  we  propose  overall  strategies  which 
seem  to  offer  promise.  The  problems  considered  here, 
in  all  their  generality,  are  very  involved.  Some  of  the 
issues  which  demand  further  investigation  include,  the 
localization  of  edge-teiminations,  the  detection  and 
localization  of  junctions,  the  discovery  and  correction 
of  systematic  errors  in  the  fitted  curves  and  the 
development  of  mechanisms  to  correct  for  errors  intro¬ 
duced  due  to  the  invalidity  of  the  edge-detector 
hypotheses.  We  must  also  devCop  non-arbitrary 
mechanisms  to  choose  the  unavoidabi  thresholds.  As 
the  reliability  of  the  detection  and  the  localization  of 
high-contrast  edges  is,  in  general,  more  than  that  for 
edges  of  low-contrast,  it  is  advisable  to  vary  the  thres¬ 
holds  on  the  basis  of  the  contrast.  Consideration  of 
the  ed gel-contrast  may  also  be  helpful  during  the  link¬ 
ing  stage. 

Adequacy,  and  not  efficiency,  was  our  chief  con¬ 
cern  in  the  development  of  our  algorithms.  Hence,  we 
have  neither  done  a  systematic  analysis  of  the  timings 
associated  with  the  various  components  nor  have  we 
made  an  effort  to  improve  them. 


Fig.  8-e.  Blocks-World  Scene  :  Linked  edgels 
fitted  with  straight-lines  and  guided-conics. 


© 

O 


183 


Appendix  I  :  Approximate  Distance  of  a  Point 
from  a  Ccnic 


Consider  the  conic. 

C( i  .y )  =  s .i~  +  6  zy  +  e.y 2  +  d.z  +  e.y  +  /  =  0 

The  distance,  d„  ,  of  a  point,  Pt  =  (x„,y.  ),  from  the 
conic  is  measured  along  a  perpendicular  dropped  from 
P„  onto  C(x  ,y  )  =  0.  \Ve  seek  to  estimate  this  dis¬ 
tance. 

The  conic-section  can  be  viewed  as  the  intersec¬ 
tion  of  the  x-y  plane  with  a  surface  C(x,y)  as 
shown  in  Fig.  7.  The  gradient  at  any  point  on  the 
conic  is  perpendicular  to  the  conic  at  that  point. 
Hence,  the  distance,  d„ ,  of  a  point,  P6 ,  Irom  the 
conic  is  measured  along  the  gradient-direction  at  some 
point,  P ,  on  the  conic.  If  P  is  dose  to  P „  then  it  is 
reasonable  to  assume  that  the  direction  of  the  gra¬ 
dient  at  P„  closely  approximates  that  at  P.  With 
this  assumption  it  is  straightforward  to  obtain  a 
closed  form  solution  for  d. 

The  direction  of  the  gradient  at  Pc  is 

0  =  tan  1 1  I ,  where  8  is  measured  clockwise 

I  dC  / dz  I 

from  the  x-axis.  Now  consider  the  curve  obtained 
from  the  intersection  of  C(x,y)  with  a  vertical  plane 
passing  through  P,  and  oriented  in  the  direction  of 
the  gradient.  Let  us  call  its  horizontal  axis  the  d-axis 
and  let  the  curve  be  represented  as  Q{d).  Further, 
let  the  origin  be  a*  P,  in  this  reference- frame.  Fig.  G 
illustrates  a  typical  situation.  Now  let  us  write  out 
the  1-D  Taylor  Series  Expansion  about  the  point  P , 
in  terms  of  the  directional  derivatives  of  C(x,y).  We 
will  use  a  subscript  8  to  indicate  a  directional  deriva¬ 
tive  of  C(x  ,y  )  in  the  direction  8. 

Q(d)=  QI0)+  Q'(0)  d  +  ~  WM2 
-f-  higher  order  terms 

whore 

£(0J  =  <?(/,  ,y„  ) 


=  CHz..y.) 

=  ±VCU.,yJ 

♦IS 

,  n 

l  ^ 

-  C;<x.,yJ 

d-c 

d-C 

=  (  cos  It?)  sin  (£1)  j 

dz- 

d-c 

dz  dy 

(PC 

dz  dy 

dy- 

cos  (ft) 
sin  ((?) 


at  U..v. 


Credit  >1  Direct  ioa 
•IP 


Point  P.=  <x,0 
Distance  ta 


Surface  Cfx. ;) 


Fig.  7.  The  conic  C(x,y )  =  0  is  shown  as  the 
'•ntersection  o'  the  surface  C(i,y )  with  the  x-y 
plane. 


Sabs: '  uii  ig  cos(fl)  = 
sin  |0)  = 


jC 

dz 


|  VC  | 
dC 
Qp 

I  VC  I 


and 


we  get 


evaluated  at  (x,,y,  ) 


The  hightr  order  terms  can  immediately  be  seen 
to  be  0  because  C(x  ,y)  contains  onl_  order 


terms  in  x  and  y. 

Noting  that  Q{d)  =  Q  at  the  intersection  of 
Q(d)  with  the  conic,  it  follows  that  the  distance  of 
P,  Trom  the  conic  can  be  estimated  from  the  roots  of 
the  quadratic  Q{d)  =  0. 


d. 


min 


(i 


I  VC(x,,yJ| 

c;'(x.  ,Po ) 


\f  I  V C(x,  ,y„  )  I  2  +  2.C(x»,y„  ).C#"(x,  ,y,  ] 


Note  that  the  only  approximation  involved  is 
that  P0  is  close  enough  to  P  so  that  the  direction  of 
the  gradient  at  (he  two  points  is  nearly  the  same.  If 
the  roots  of  the  quadratic  are  imaginary,  then  this 
assumption  is  obviously  violated.  The  line  passing 
through  P0  in  the  direction  6  in  the  x-y  plane,  in  gen¬ 
eral.  intersects  the  conic  at  two  points  as  shown  in 
Fig.  8.  We  choose  the  smaller  distance,  invoking  the 


184 


1 1  +  1 1(  +  iji 


Fig-  'J.  A  typical  profile  for  the  curve  Q(d)  is 
shown.  ( </  )  is  clnained  from  the  intersection 
of  the  surface  Clx.y)  with  a  vcr'ical  plane 
passing  through  P„  and  oriented  in  the  direc¬ 
tion  of  the  gradient  there. 


■  CISC 

I  i 


assumption  once  again. 

.  |  I  .  „ 

10  - - 1  is  small,  then 

I  (*,  -v, )  I 


If  the 


This  expression  has  hern  previously  used  je.g. 
Turnor'7(|  to  estimate  the  distance.  However,  its  lin- 
it. at  ions  have  not  been  noted.  It  is  valid  under  the 
eh  seness  assumption  and  the  assumption 
t  |  cl,(x'  y J  |  .  ii 

that  -  — - -  is  small.  To  see  the  er:or  in 

I  Vi'U,,y<  )| 

using  thi  expression,  consider  the  conic 

f  ( r ,  v  j  =  i  a.j  +  b.v  +  e  )5  =  0,  which  represents 
a  si  •:  >';ht  !,ne.  The  distance  estimated  by  the  gra- 
dioi::  appr  -ximation  can  easily  he  seen  to  be 

j  a.  z,  4  b  y,  +  r  I 

-  ■  ■  .  This,  of  course,  as  off  by  r 

3v/u: -^s 

factor  •;  2.  The  reader  should  observe  that  while  our 
dl.tanr.-  estimate  is  never  less  than  the  actual  dis¬ 
tance,  !'•  ?  gradient  estimate  may  be. 


Apperdlv  II  :  Exact  Distance  of  a  Point 
from  a  Code 


V»e  indicate  here,  how  one  may  proceed  to  find 
the  *  act  distance  of  a  point,  P,  —  (x,  ,y, ),  from  a 
conic,  'i'(x.si)  =  0.  As  discussed  in  Appendix  I,  thb 
distance  is  measured  along  the  gradient-direction  at 
some  noint  on  the  conic.  Hence,  we  want  to  find  a 
point,  P  =  (x  ,y  i,  on  the  conic  such  that  the  vector 
conr.ec  :ng  P  end  P„  is  parallel  to  the  gradient  at  P , 
i.e. 


d- 


(y  -  y<  i 


££LUL),(, 


o 


As  the  partial  derivativ  e  of  C(x,y)  are  line:'  func¬ 
tions  of  x  and  y,  we  have  a  second  order  equation  in  x 
and  )  Now  if  we  express  x  ,nd  >  in  their  parametric 
form  (sec  Pa-  ii.dis’83]. 


*(») 


> 


it,  +  if  |t 


+  if,/1  ’ 


y(») 


■j,+  y,t  +  tl,i’ 
w,  +  if,t  +  i f,t* 


we  get  a  quartic  equation  in  the  parameter,  t.  Quar- 
tic  equations  have  closed  form  solutions,  although 
they  are  too  involved  to  be  of  practical  use.  Solv 
for  t  in  the  admissible  range  (typically  0  to  I),  it 
straightforward  to  lmd  P  and  consequently  the  dis¬ 
tance  of  P,  from  the  conic. 


Acknowledgement 

V.S.N.  would  like  to  thank  bis  advisor,  Tom  Binfcrd, 
for  many  useful  discussions. 


[Attneave’Vt] 


[Ballard  & 
Br.,vn’82| 


|Du<la  k 
llart’73) 


[Nalwa'84] 


[I’avlidis'Ad] 


[Pavlidis’82] 


(TurnerT-tl 


Reference* 

F.Attneave:  “Some  informational  As¬ 
pects  of  Visual  Perception,”  Psycholog¬ 
ical  Review,  1954,  Vol.61,  No.3,  183- 
193 

DA. Ballard,  C.M. Brown:  "Computer 
Vision,”  Prentice-Hail  Inc.,  Englewood 
Cliffs,  1982. 


R.O.Duda.  P.E.Hart:  “Pattern 

Classification  and  Scene  Analysis," 
John  Wiley  &  Sons  Inc.,  New  York, 
1973. 

V.'J.Nalwa:  "On  Delecting  Edges," 
Prc»c:  Image  Understanding  Workshop, 
New-  Orleans,  Oct.  1984,  157-164. 

T.Pavlidis:  "Curve  Fitting  with  Conic 
Splines,”  ^CM  Trans,  on  Graphics, 
Vol.2,  No.  1,  Jan.  1883,  1-31. 

T.Pavlidis:  "Algorithms  for  Craphi-s 
and  Image  Processing,"  Computer  Sci¬ 
ence  Press  Inc.,  liockville.  1982. 

K. Turner:  "Computer  Perception  of 
Curved  Objects  using  a  Television 
Camera,”  Ph  D.  Thesis,  A.J.  Lab., 

1  iliv.  of  Edinburgh,  Nov.  1974. 


12r 


CiCQ  A'V-7'j  -  /'! 


Introducing  a  Smoothness  Constraint  in  a  Matching 
Approach  for  the  Computation  of  Displacement  Fields 


P.  Anju.d*n  anil  Richard  Weiss 
Computer  and  Information  Sciences  Departement 
University  of  Massachusetts 
Amherst,  Ma  01002 


Abstract 

Correlation  matching  techniques  for  computing  displace¬ 
ment  field*  from  successive  frame*  is  *  dynamic  image  se- 
quence  are  known  to  be  error-prone.  Our  pteviona  work 
[AnaaSt)  baa  Keen  concerned  with  identifying  the  eourcee 
of  tbeee  error*  and  computing  a  confidence  measure.  Here 
we  formulate  n  amooC  toar  constraint  that  is  useful  for 
improving  unreliable  matches  in  an  image  baaed  os  the  re¬ 
liable  ones.  We  note  the  relationship  between  our  formula¬ 
tion  and  the  gradient  baaed  formulation  of  the  smooths*  i 
constraint.  W*  provide  a  hierarchical  matching  algorithm 
th;  t  includes  our  smoothceaa  constraint  and  show  prelim¬ 
inary  result*  of  applying  our  algorithm  to  real  images. 

1  Introduction 

Intensity  based  techniques  for  the  computation  of  displace¬ 
ments  at  image  point*  between  a  pair  of  image  frames  rely 
on  the  similarity  of  the  light  intensity  reflected  from  a  scene 
location  in  the  two  frames.  Such  techniques  include  fra  A 
cut  sad  cerrsietwu  Uekm fsss. 

In  th*  traditional  formulation  of  the  gradient  techniques, 
an  intensity  constancy  assumption  is  used  to  provide  a  par¬ 
tial  constraint  ■-  v  th*  displacement  vector  at  each  point  in 
th:  image.  A  smoothness  constraint  on  the  displacement 
field  is  then  ased  to  uniquely  determine  the  displacements 
[Hornfil.HildSS]  of  all  the  points.  The  correlation  tech¬ 
niques  usually  provide  a  unique  displacement  vector  based 
on  local  matches,  without  incorporating  any  smoothness 
constraint.  Hence,  obvious  local  errors  in  the  displace¬ 
ment  fields  computed  by  the  correlation  techniques  go  un- 
tec  ted. 

In  our  previous  work  |  Au»n$4(,  we  identified  soma  com¬ 
mon  sources  of  matching-errors  during  correlation  and  pro¬ 
vided  a  scalar  confidence  mesitre  with  the  estimated  dis¬ 
placement  at  an  image  point.  This  measure,  which  indi¬ 
cated  the  reliability  of  the  associated  displacement  esti¬ 
mate,  used  information  already  available  during  the  a  re¬ 
lation  process  sad  hence  did  not  require  significant  addi¬ 
tional  computation. 

This  mesTek  vs*  rapport  «i  by  DAEPA  uder  (rest  N00014-S3-K- 
0404. 


Although  our  ptuvirns  measure  was  useful  in  determin¬ 
ing  the  reliability  of  a  displacement  estimate,  w*  indicated 
that  since  the  displacement  is  a  r  actor  quantity,  n  vector 
valued  confidence  measure  may  be  more  appropriate.  This 
paper  provides  such  a  measure  sad  formulates  a  smooth- 
new  constraint  on  the  displacement  field  that  asm  our  new 
confidence  measure. 

Intuitively,  we  regard  a  displacement  field  to  be  "smooth* 
’  an  area  of  the  image  if  He  variation  over  the  ana  is 
.  >aU.  An  exam  pi*  of  a  measure  of  th*  spatial  variation  of 
..  uplacement  field  is 

ft— a(lT)  -  /  /  \VV\'dxd, 

where  U  *(*,*)  is  the  displacement  vector  at  (s,  y)  (with 
u  and  *  as  its  components  1 .  *  sad  y  directions  respec¬ 
tively),  and  the  domain  cf  the  integral  is  an  area  -f  tve 
image  (possibly  the  whole  image)-  This  is  due  to  Hon 
sad  Schanek  (Hon*  1|,  who  use  this  measure  in  s  gradient- 
based  approach  for  th*  computation  of  optical  dow.  An¬ 
other  example  of  such  a  measure  is  provided  by  Tersopou- 
Jo*  [TersfiA],  which  we  will  describe  is  detail  later  in  this 
document. 

The  formulation  provided  ben  m  the  siaimiaatioo  of 
the  sum  of  two  smra,  r ,  r-  and  ,,  Ghvs  a  dis¬ 
placement  field,  g-n  r-  msssarss  its  spatial  variation  aed 
measures  its  dwtAtioa  from  th*  initial  displacement 
vector*  computed  by  the  mstchirg  proems. 

Cur  choke  of  the  approximation  error  is  based  on  the 
result*  cf  s  matching  proceas.  Each  displacement  vec¬ 
tor  is  represented  in  s  convenient  local  ortho- normal  bjeis 

(e _ which  an  anally  not  parallel  to  (r,  p).  For 

a  given  displacement  field  U,  th*  approximatioa  error  is  a 
weighted  sum  of  the  deviations  of  th*  component*  (along 
these  basis  vector*)  of  th*  displacement  vectors  in  tha  field 
tom  the  corresponding  component*  of  known  initial  val¬ 
ue*  (provided  by  th*  matching  procem).  The  weight*  an 
th*  components  of  the  vector-valued  confidence  measures 
along  th*  basis  vsetora. 

Intuitively,  th*  bask  vectors  t _ and  and  the 

weights  e —  and  e--  can  be  understood  as  follows:  At 
a  point  along  an  edge  in  th*  image,  the  basis  vector  Sh 
will  be  approximately  cnented  in  the  direction  normal  to 
the  edge,  and  ' —  will  be  oriented  parallel  to  edge.  At  sue  . 


a  point,  the  Height  Cm,  will  be  large  and  the  weight  c^. 
will  be  small.  On  the  other  hand,  in  an  area  of  the  image 
with  email  intensity  variations,  both  the  weights  will  be 
■mall,  whereas  at  a  point  along  contour  with  high  curva¬ 
ture,  both  the  weights  will  be  high.  The  precise  definition 
of  the  basis  vectors  the  weight*  are  provided  in  section  S. 

The  precise  form  of  the  approximation  error  is 

=  c, ,1  {U  t„ ,,  —  D 
+  w-D  •*_*)* 

where  cmat  and  Cn  are  the  weights,  U  is  the  displacement 
vector  and  D  is  the  initial  displacement  vector  provided  by 
the  correlation  matching  algorithm,  at  location  (*,  y)  in  the 
image.  Her*  c—  and  indicate  the  confidence  in  the 
components  of  the  displacement  vector  D  in  the  directions 
« —  and  g--  respectively. 

In  the  rest  of  this  paper,  we  develop  this  idea  and  for¬ 
mulate  it  as  a  relaxation  process.  We  extend  the  analysis  of 
Tersopooloe  (TenM|  for  the  surface-reconstruction  ptob- 
lem  to  our  two  dimensional  minimisation  problem.  We 
follow  a  finite-element  approach  to  eo bring  the  problem, 
primarily  because  this  approach  enables  ns  to  deal  with 
arbitrary,  known  discontinuities  in  the  displacement  field. 

We  also  incorporate  the  smoothness  process  within  the 
framework  of  a  hierarchical  algorithm  for  the  computation 
of  displacement  fields.  Our  hierarchical  algorithm  m  simi¬ 
lar  to  the  multi- frequency  coart*- fine  techniques  dnaihd 
in  jClas83,Ana&84|.  At  «*'•  level  of  the  hierarchy,  we 
include  the  smoothness  process  after  competing  displace¬ 
ment  estimate*  by  correlation  a  matching  proems. 

The  relationship  of  this  minimisation  formnlation  to 
other  formulations  is  described  in  section  1  In  section  8, 
the  conditions  for  the  existence  of  a  eolation  to  this  prob¬ 
lem  are  explained,  and  in  section  4  the  finite  element  ap¬ 
proach  is  need  to  provide  a  discrete  relaxation  algorithm. 
The  source  of  the  weights  («,  and  c^,,  and  tha  basis 
vectors  g —  and  «»*,  is  diecamad  in  section  I.  The  incor¬ 
poration  of  this  emoothnam  constraint  in  the  hierarchical 
algorithm  is  described  in  Section  6.  Finally,  some  experi¬ 
mental  result*  are  provided  is  section  7  and  the  scope  for 
future  work  is  discussed  in  section  8. 

2  Relationship  to  other  work 

The  minimisation  problem  poted  in  the  previous  faction 
has  its  roots  in  other  similar  work  in  computer  vision.  In 
particular,  the  formulation  of  the  intensity  and  smooth- 
Dess  constraints  for  the  computation  of  optic  flow  field* 
by  Horn  and  Schanck  [HornSlj,  tha  related  work  by  Nagel 
(Nage8S,Na„t84|  and  Hildreth  |Hild8S|,  and  the  surface  re¬ 
construction  problem  posed  by  Tereop-nJo*  [lers64|  are 
closely  related  to  our  formulation. 

First,  ws  explain  the  relationship  of  oar  work  to  that 
of  Hom  and  Schcnck.  Our  smoothness  constraint  can  be 
chosen  to  be  the  tame  as  their*.  The  intensity  constraint 
they  uss  is 


v/.K  +  ^  =  o  0) 

where  V/  is  the  spatial  gradient  of  the  image  intensity  I  at 
a  point  (*,  y)  in  the  image  and  V  is  the  image-velocity  at 
that  point.  Under  the  assumption  that  the  time  interval 
St  between  successive  image  names  is  small,  re  can  ap¬ 
proximate  V  by  D/St,  where  D  is  the  displacement  of  as 
image-point,  and  91 /9t  by  A J/St.  Then,  we  can  r. write 
equation  1  as 

V/-  0  +  A/  =  0  {:) 

Based  on  thin,  we  can  define  an  error 

•# 

which  is  eero  when  V  *  D  where  D  m  any  wine  ‘Hat 
satisfies  the  eqnntion  2  above. 

This  error  can  be  rewritten  as 

where  *v;  Is  the  unit  vector  in  the  direction  of  VI. 

From  equation  2,  we  see  that  D  ■  #v;  <*  -A//|V/]. 
Considering  onr  ,  term  described  in  section  I,  if  we 
set  C.S,  “  0,  (u,  a  jV/]*,  g— -  «  *vi,  onr  approximation 
error  is  the  same  as  that  a?  Horn  and  Schanck.  Indeed, 
their  intensity  constraint  simply  provide*  vales*  for  an* 
component  of  the  displacement  vector  at  any  point  in  the 
image,  vta^  the  component  in  the  direction  of  tha  inten¬ 
sity  gradient.  This  is  b scenes  only  the  first  ardor  image 
Intensity  variation*  am  considered. 

Nagel  [Nage83|  points  ont  that  at  certain  uvn  in  the 
ha-ge  it  may  be  possible  to  bcaOy  obtain  veins*  for  both 
components  of  the  displacement  vector.  This  is  usually 
true  at  point*  of  high  curvature  along  image-contours,  or 
in  textured  areas,  where  the  second  order  infoncity  varir- 
liens  am  large  and  can  be  eesfnl  for  obtaining  the  eniqn* 
displacement  vector.  In  section  S,  we  cvplsni  how  a  cor¬ 
relation  matching  algorithm  behave*  at  each  arum  of  *hs 
image. 

Both  onr  formnlation  and  the  formnlation  of  Horn  and 
Schuach  can  be  regarded  as  a  two  dimensions!  version  of 
the  emoothnam  process  presented  by  Tereopouios  fT«u8«|. 
Tereoponioe  is  interested  in  visual  surface  •'■construction. 
This  leads  to  the  problem  of  smoothing  s  scalar  variable 
e,  the  depth  of  the  visible  surface  along  each  viewing  di¬ 
rection.  Approximate  values  for  the  depth  and  associated 
confidence  measures  am  issumnl  to  be  known  at  soms  lo¬ 
cations  in  the  image.  Similar*,  approximate  orientation 
of  the  surface  and  associated  confidence  measures  am  dso 
assumed  to  be  known  at  selected  locations. 

The  problem  is  that  of  minimising  the  sum  of  three  er¬ 
rors  E — and  Ed,  and  Eo  where  E _ . _ is  due  to 

a  surface  smoothness  eoartraint,  Fa  is  doe  to  the  known 
approximate  depth  values,  aad  Eo  is  due  to  known  ap¬ 
proximate  orientation*  of  the  surface.  Tereopouios  consid¬ 
ers  two  possible  measures  of  the  spatial- variation  of  the 


137 


condition  implies  that  B  in  positive  ietunW  and  theraore 
S  has  &  Bfliqqf  piaiamL 

4  Solving  the  variational  problem 

*  In  th«  previous  section  w«  showed  that  there  in  a  unique 
smooth  vector  field  which  minimises  the  coutrainte;  in  thin 
section  we  present  methods  to  obtain  a  discrete  approxi¬ 
mation  to  this  solution.  The  choice  of  domain  and  basis 
functions  here  is  not  one  of  the  standard  ones  sad  is  based 
on  the  approach  of  Teraopouloe.  We  have  extended  hie 
renal te  to  two  dimensional  Sow  fields,  and  we  present  the 
masks  which  represent  the  solution.  We  intend  to  isvssti- 
gate  a  more  standard  choice  of  basis  for  the  finite  element 
technique. 

We  take  as  oar  discrete  solutions  p*sccwise  polynomi¬ 
als;  each  polynomial  being  defined  on  a  domain  0  in  the 
plan.  For  the  membrane  model,  we  chcue  a  triangular 
finite  element  an  shown  in  figure  I.  For  the  thin  plate,  we 
choaee  a  domain  wi.  'cb  connote  of  a  se,  of  six  points  on  a 
square  grid  (see  figure  2). 


Figure  1:  The  finite-element  domain  for  the  membrane 
model 


0,1 

I 

0 

0,0  1 

Figure  2:  The  finite-element  domain  for  the  pUte  model 


There  are  two  requirements  for  the  finite  element  ap¬ 
proach;  that  a  unique  eolation  ;xb>  for  each  particular 
grid,  a ad  that  the  solutions  converge  as  the  mash  rise  goes 
to  sero.  Our  choice  of  the  Suite-elements  follows  that  of 
Tcrsopoulos.  He  verifies  that  them  elements  satisfy  these 
two  requirements. 

4.1  Computation  ot  masks 

In  order  to  solve  the  discrete  minimisation  problem,  linear 
equations  in  the  values  at  thu  grid-locations  are  derived. 
These  equations  are  need  to  update  the  values  U  m  («,  •) 
at  a  point  in  terms  of  its  neighbors.  This  is  commonly 
done  by  convolution  with  a  mask;  the  masks  tor  each  of 
the  two  models  arc  given  bekx  r. 

For  the  membrane  model,  the  convolution  ma*  is 

1 

J  x  1  0  1 
I 


For  the  thitt-plate  model  it  is, 

-1 

-2  I  -2 

£  x  -i  a  o  s-i 

-2  S  -2 
-I 


Let  us  denote  the  convolution  mask  as  A.  Also,  let 
0  •*  A*  U.  Then,  solving  the  discrete  problem  is  the  same 
as  solving  the  following  system  at  coupled  equations; 


(U-U)  + 

+  CWMm(L  ■  (m  —  D  •  tajwJteS 


for  each  point  on  the  image  grid. 


4.3  Relaxation  algorithm 

There  are  a  number  of  numerical- methods  for  solving  the 
system  of  coupled  linear  equations  described  above.  One 
of  the  simplest  methods  is  the  Cause- Seidel  relaxation  at 
gorithm.  This  is  an  iterative  process,  where  during  each 
iteration  the  value  of  If  at  each  point  in  the  image  is  solved 
in  terms  of  .he  values  of  its  neighbors. 

In  our  case,  we  have 


l^+‘  =  O'  a  -J=Z-((D-C  j  e_)e^. 

+  ~~r((D  -O')  ««.)e^. 

CwM*  +  1 


(12) 


where  the  superscripts  denote  the  number  of  the  iteration. 

From  a  geometric  point  of  view,  this  updating  scheme 
can  be  regarded  as  cbooeing  a  point  in  the  (*,v)  space 


1G9 


* 


&2 


>  $ 


that  k  a  combination  >7*  and  D.  We  iOustnt*  this  idea 
in  figure  3.  for  convenience,  we  have  chose*  to  repneent 

the  displacements  in  a  cartesian  coorxiinai*  system  with  its 

ana  parallel  to  (« — The  two  Lay  parameten  are 
£ — fit  +  £ — )  and  Cm/(1  +  cm),  which  vary  between 
0  and  1,  as  and  £ —  vary  between  0  and  oo  Since 

c _ >  <• - the  location  of  £/*♦'  will  alwayebeonor  above 

the  line  joining  D  a ad  {7*  in  figure -3  '«  particular,  it  can 
be  eeen  that  the  location  of  C’~*1  will  be  ahraye  within  the 
triangle  that  ia  shown  in  that  figure,  moving  towards  the 
line  joining  D  and  (7*  r*  £--  gets  closer  to 


Figure  3:  A  geometrical  illustration  of  the  relaxation  pro- 

ceaa 

Finally,  we  note  that  although  the  Gann*- Seidel  relax¬ 
ation  artwme  ia  aimpie  to  impleroant,  there  are  more  ef¬ 
ficient  techniques  to  eolve  a  spars*  system  of  equation*. 
Studying  thee*  and  chooeing  an  efficient,  parallel  acheme 
will  be  a  part  of  our  future  work. 

6  The  baiii  vector*  and  the  confi¬ 
dence  measures 

One  of  the  key  element*  of  our  formulation  la  the  approxi¬ 
mation  constraint.  The  proper  choice  of  the  ortho-normal 
baaii  vectors  «---  and  and  the  confidence  meajurce 
r  and  r  -  are  cruciai  to  our  model  and  to  our  algo¬ 
rithm.  In  this  eection  we  discus*  how  thee#  measure#  can 
be  obtained. 

In  attempting  to  choose  these  measures,  it  ia  useful  to 
formalise  our  efforts  in  the  form  of  a  set  of  design  consid¬ 
erations.  These  considerations  are, 

1.  If  both  components  of  the  two  dimensional  displace¬ 
ment  vector  are  reliably  known  at  an  image  location, 
both  £ _ and  should  be  large.  If  one  compo¬ 

nent  is  reliably  known  and  tbs  other  ia  not,  then  the 


£ —  should  be  Urge  ana  £ —  small,  and  Ca,  should 
be  oriented  along  the  direction  of  the  reliable  com¬ 
ponent.  If  both  are  unreliable  then  both  tmm  and 
Cm  should  be  low. 

2.  The  computation  of  and  £_  should  take  mini- 
mam  effort  in  addition  to  what  is  already  necessary 
for  the  computation  of  the  displacement  vector*. 


In  our  previoua  stud/  on  the  behavior  of  error  (rm 
of  squared-difference*)  surfs.-**  (AnnaM|,  we  noted  that 
the  snaps  of  these  surfaces  contained  useful  information 
regarding  the  reliability  of  the  displacement  vectors.  Be¬ 
low,  we  explain  what  we  mean  by  these  terms,  and  briefly 
summaries  our  major  obnwvutkras. 

Tbs  proems  of  matching  by  discrete  correlation  con¬ 
sists  of  r  boosing  n  window  around  n  point  of  interest  ia 
one  frame  (the  first  window),  and  correlating  the  intensity 
vthes  in  that  window  with  thorn  in  the  candidate  match 
windows  in  the  eocond  frame  (the  second  window).  The 
term  tsrrolsfisn  is  ussd  ia  s  generic  sense  here.  The  mea¬ 
sure  that  we  prate  (for  lessens  explained  in  |AaaaM|)  is 
the  sum  of  squared  differences  betwesn  corresponding  petal 
values  (SSD). 

By  tha  term  msr  surface,  we  main  the  surface  whoa* 
height  is  the  SSD  value  corresponding  to  each  possible  die- 
plscvmsr.t.  Hence,  for  each  location  of  interest  in  tha  first 
trams,  we  have  avail  able  an  error  surface.  The  beet  match 
is,  indeed,  the  displacement  correspo.  -ling  to  the  minimum 
height  of  the  error  surface  (henceforth  called  the  prf). 

In  our  (tody  of  the  error  ccrfacss  [AnanM],  we  also 
noted  that  the  curvature  of  tbs  snrr  surface  around  the  O 
pH,  taken  along  any  direction  seems  to  provide  information 
regarding  tbs  reliability  of  the  match  along  that  direction. 

Ia  particular,  we  observed  the  following : 

1.  At  n  corner  point,  the  pit  is  sharp  ia  all  directions; 
correspondingly  all  the  directional  curvatarse  are  high. 

2.  At  a  point'  along  r a  edge,  th*  error  surface  shows  s 
long  valley  like  s'  ructuie,  a<4  th*  orientation  of  th* 
valley  corresp  .v  Is  to  that  of  th*  edge.  This  means 
that  the  curvature  in  the  direction  parallel  to  the 
edge  is  low,  whereas  that  in  the  direction  perpendlc- 

0  ular  to  tbs  edge  is  high. 

5 

3.  At  a  point  ia  a  homogeneous  ares  all  curvatures  are 
low,  and  th*  error  surface  looks  rather  flat. 

Figure*  4,  5,  and  6  show  examples  of  error  surfaces  at 
a  comer-point,  an  edge-point,  and  a  homogeneous  point 
respectively.  The  surfaces  are  shown  inverted,  in  order  to 
enhance  visibility.  Note  that  in  a  small  area  arouud  the 
point  with  minimum  error  (the  peak  in  these  figures),  the 
surfaces  demonstrate  the  properties  described  above. 

The  directional  curvatures  of  tha  error  surface  reent  to 
satisfy  the  first  of  our  design  criteria.  Further,  the  com¬ 
putation  of  the  error  surface  is  a  necessary  part  of  the 


s 


190 


Figure  4:  The  error  surface  it  I  corner  point 


Figuri  S:  The  error  surface  at  an  edge  point 


Figure  C:  The  error 


luiface  at  a  homogeneous  po*nt 


mitrhmg  process,  and  the  computation  of  the  curvature* 
require*  only  minimal  additional  effort.  Hence,  tkie  meetc 
both  on r  design  criteria.  Baaed  on  theee,  oar  method  cf 
encosing  c^,  and  -  -  -  are  ae  follow*  : 

1.  Compute  the  principal  ennreiures  and  the  direction* 
of  the  principal  axes  around  the  pit  of  ike  error  ear- 
face.  Let  f?-~  and  Cw  be  the  principal  curvature* 
«nd  g—  and  g-  -  be  the  unit  rector*  along  the 
directions  of  the  principal  axes. 

2.  Compete  the  e.  .cr  corresponding  to  the  location  of 
best  match.  Let  tkie  be  SSDaa. ,. 

3.  We  ckooM  i.,  -  and  e_  -  'hi.-  »«r  <W. 
abd  we  has*  a  namcer  of  ckoscae: 

(a)  We  can  simply  take  c_  »  CMI  and  » 

C--. 

fb)  We  cm  normaii*e  the  principal  curvature*  to  '•* 
within  some  desired  rang*  and  to  be  waled  hr  * 
desired  factor,  and  ckooee  Cm,  «  NORhd(C^.) 
and  Cm  «  N02U{C^.).  Her*  JVOKAf(.)  is 
the  normalising  function  of  the  curvature. 

(c)  W*  can  normalise  the  principal  cnrratnre*  using 
SSD '.to.  We  can  choose 


k,C»«.  +  IhSSD +  it 


Tore  has  ike  effect  of  rnakit-g  the  confidence  In¬ 
versely  proportional  to  the  error  corr*sp  mding 
to  the  bast  match.  Usually  this  error  is  high 
when  the  local  physical  structure  represented 
in  the  image  changes,  due  to  expansion,  rota¬ 
tion,  etc.  of  the  image,  or  if  the  imsjs  has  been 
ccurup ted  by  noise.  Strictly  speaking,  correla- 
tree  (or  SSD)  provides  truly  reliable  matches 
only  when  toeee  effects  are  absent.  Hence,  this 
type  of  normalisation  takes  >nto  account  prob¬ 
lems  that  are  unexpected,  but  in  Scenes  the  er¬ 
ror  measure. 


The  choice  ot  the  normalisation  of  the  principal  curva¬ 
tures  to  obtain  the  confidence  measures  is  a  crucial  factor. 
For  the  experiments  described  in  this  paper,  we  usod  a 
normalisation  of  the  type  discussed  in  3(c)  above,  choos¬ 
ing  fc,  =  0,  kj  =  1,  and  k,  =  100.  Theee  choice*  were  mad* 
cn  an  empirical  basis,  although  we  obeert  ed  that  varying 
them  did  not  significantly  affect  our  results.  This  issue  is 
open  for  fur  ther  research. 


191 


•  Thn  procaM  ia  rtpoted  at  all  lertb  ap  to  *ka  lrr«l 


.  f.UI  **»!!!  i1? 
iff jg  flit iff -|  i-t 

mh1*  wl*  nir!  tf 


m  Ur  |?|HI  iiiifi 

lift  i;fti  bisfi i .jfi?  ffitimm 

iIU  sUU  uliff  ?fuff  Iff iuiillu 


f  li  II 


; 

:  ill 


*»;  mil f  [III  KKU 

l\\  ml  ilfii  rlli 


S-gf  2* 
slf  :B 

iilti 


m 


&  s* 

bS 

o  <L 

n 


i  in  5 


ill  Hi 


m  i  I  in  nimfi  ii  1 1 

mLi  nfyi !  !?if?ul!EH  I 


'1 1  f  i  |  s  f  i  *  | 

I?- 


;i*Hi  itm 

rill  nil 


i  mm\n 

« rri'.lfEuill 


i  {  if 

i  ?  lb! 


I 


Ifffl 

Ills 


rill 5 3  im Ifcilf III  I  ;tir«f  jn»! 

Ilfii  mtmn  I  i  K  m  h 


1 55 1  mm 

u  jfijij 

him  If 

I  B-B  I  ll 


rigure  8:  The  initial  dkpUcarmt  eetimatee  provided  by 
‘.he  matching  proceae.  Only  *  quarter  of  the  displacement 
vector*  are  ehown,  in  order  *o  enhance  nubility 


Figure  0:  The  'ikpiacement  held  after  100  iterations  of 
«nx>o  thing  using  the  membrane  model 


th#  hierarchy  are  sufficient  for  processing .  At  each  level,  we 
performed  10  iterations  of  the  relaxation  algoritLm.  'flue 
n amber  wax.  choser  ai  bitruily. 

In  figures  13  aid  14,  we  display  the  di  placement  field* 
at  the  four  level*  <  i  the  hierarchy.  The  cLamaiic  improve¬ 
ment  in  the  final  r  waits  are  .he  revolt*  of  the  integration  of 
the  matching  and.  the  smoothing  process  in  the  hierarchy. 
Smoothing  at  o ae  lew1  of  reeolation  will  not  pm  ride  sech 


improvement. 

In  figure  <5  w#  display  the  confidence  measures  ' — 
and  c**,  as  intensity  images.  The  brightness  of  a  point  in 
these  images  are  proportional  to  the  confidence  measuie*. 
We  have  also  superimposed  the  unit  vector:  on  the 

figure  containing  Cm,,  in  order  to  demonstrate  that  it  is 
usually  perpendicular  to  the  edges  in  the  image. 

For  the  purposes  of  n  closer  scrutiny,  in  figure  16  we 


dismay  the  results  only  at  the  image-resolution. 


*-  r 


- - — 

t  -> —  •v*?+  *:  r.*«r  •.  w-t**  r 


*  ■'<-  - 

-  -  v\  *  -’Tfer'-i-  *  V| 

"* **•  '*•  V*  £  ‘gi 


Figure  13:  The  results  of  th#  hierarchical  matching  algo-  _  ..  .... 

rithm  without  the  smoothing  process  when  applied  to  the  ®.r**u^  ^ths  hierarch  kal  matching/ smoothing 

road-Ktn:  Image-pair.  aliontum  appliwi  io  the  road-.cn.  image-pair 


In  both  the  figure*  abort,  result*  at  four  level]  of  the  hierarchy  are  shown.  Th*  background  images  are 
the  low  pass  filtered  version*  of  th*  first  road  image.  The  image  resolution  is  123  x  123  (shown  in  th* 
bottom-right  quadric  ■).  At  each  level,  except  the  13  x  16  resolution  in  th#  top-left  Quadrant,  we  have 
shown  only  a  sample  of  the  displacement  vectors.  Tola  we#  don*  so  as  to  enhance  visibility. 


Figure  IS:  The  confidence  measure*  e*m  (on  the  left)  and  Cm  (on  th*  right)  at  th*  finest-level  of 
the  hierarchy.  These  measures  are  displayed  here  as  intensity  image*.  Th*  left  figure  also  contain* 
the  unit-vectors  Note  that  -i —  vectors  are  perpendicular  to  the  edges  in  th*  c —  image,  which 
themselves  usually  correspond  to  tho  image  edges. 


,  ..  •- 


S 

£ 


-.3 " 


vr  •  •>• 

f 


-ass  ^fflasjJh-aaPfs  set 


a  - 


■•^ivfUF  -  £%»?*8r  | 

rtTa^'j*ss»(a**  *v  -•'  &*?il 

^  raiwi' ^  '"r*~  ^-iwNMwj 

:  &*■&*»  ry*  ■*$&£  .*  d 

•J  .-..-■•fe-^aJe^ST^aBfei'f-'' 

'  -  #3^r>^-4^^a>iiaiiff%  ■whewm*  3 

-• ' m***  *«•*-*  *sr 

W-3V_  NJdWEsftaMM*  *.  j 

v?  Jrk^T  | 

^  4ft^'.fifi&W»'  Pettr - -  ~ 

.wfs~sC  sj* 

51  *1 


SfSsms* 


“■  ^j2%tS&.>4ntfl»««A*W 


Figm-e  16:  The  result*  it  image-resolution  for  the  road-scene  imac-pair 


8  Future  Work 

In  this  paper,  we  provided  an  outline  of  a  technique  for 
incorporating  a  smoothness  constraint  in  a  matching  algo¬ 
rithm.  We  also  indicated  how  this  may  be  incorporated  in 
a  hierarchical  technique.  Although  we  demonstrate*!  these 
ideat  with  some  experiments,  many  details  of  the  algo¬ 
rithm  are  yet  to  be  workea  out.  In  particular,  we  feei  the 
following  areas  need  greater  analysis  and  closer  scrutiny. 

1.  The  choice  of  the  smoothness  criterion.  The  mem¬ 
brane  model  as  well  as  the  thin  plate  model  are  heuris¬ 
tics  for  the  type  of  variation  we  expert  the  displace¬ 
ment  fie'd  to  possess.  It  may  be  possible  to  limit  the 
choice  of  the  smoothness  criterion  based  on  an  un¬ 
derstanding  o:  the  geometric  structure  of  continuous 
Sow  fields.  Such  analyses  abound  in  the  literature 
(e.g.,  |Adivl'5,Waxm84j). 

2.  The  normalisation  of  the  confidence  measures.  One 
of  the  important  factors  that  affect  the  outcome  of 
the  process  is  the  validity  of  the  confidence  measure. 
At  the  outset,  a  conservative  measure  seems  more 
appropriate,  i.e.,  one  that  is  more  prone  to  doubt 
the  reliability  of  the  local  match  estimate.  This  will 
help  us  avoid  the  process  from  being-misguided  by 
incorrect  vtlues  with  high  confidence.  In  genera],  a 
more  thorough  investigation  is  critical  to  make  this 
process  useful  in  real  images. 

3.  One  of  the  motivations  behind  choosing  the  finite  ele 
ment  approach  for  solving  the  minimisation  problem 
is  that  this  approach  allows  us  to  deal  with  known 
motion  and  occlusion  boundaries.  Therefore,  the 
recognition  of  such  bouadaries,  and  the  inclusion  of 
that  information  in  the  smoothnesn  process  is  one  of 
goals  of  future  research  Wu  seek  an  understanding 


of  what  information  is  available  for  recognising  such 
events  and  hont  they  may  be  utilised. 

We  believe  that  the  hierarchical  matching/smoothing 
algorithm  is  a  useful  technique  for  the  computation  of 
dense,  reliable  displacement  fields  h.  real  images.  Many 
issue*  remain  to  be  solved.  However,  the  approach  takes 
into  consideration  the  limits  of  its  various  components,  and 
cse*  them  appropriately.  This  provides  us  the  motivation 
fer  further  investigation  along  this  line. 

Acknowledgments 

We  wish  to  thank  [>r.  George  Reynolds  far  his  comments 
or.  the  manuscript,  and  Frank  Glaser  for  many  valuable 
discussions.  Thanki  are  also  due  to  Brian  Burnt  Bill 
Guasto  fo;  their  help  in  obtaining  the  figures  for  the  paper. 

Reference* 

[Adiv8i]  Adiv,  Gilad,  Determining  3-D  Motion  and  Struc¬ 
ture  from  Optical  Flo.-  Generated  by  Several  Mov¬ 
ing  Objects,  IEEE  PAM  Vol.  PAM1-7,  July  1085, 
fp.  384-401. 

[Anan84|  Anand.au  P.,  Computing  Deuse  Displacement  Fields 
with  Confidence  Measure*  in  Scenes  Containing  Oc- 
clision,  SP.'E  Intelligent  Robot*  and  Computer  Vi- 
«<>n  Conft\*nee,  Voi.  521,  pp  184-194,  1084,  also 
CCINS  Technical  Report  84-St,  University  of  Mas¬ 
sachusetts,  December  1084. 

[Burt81]  Hurt,  P.  J.,  Fast  Filter  Transforms  for  Image  Pro¬ 
cessing,  CGiP  vol.  16,  pp  20-51,  1081. 

[BurtS?]  Hurt,  P.  J.,  Yea  C.  and  Xd  X.,  Multi  Recalution 
Flaw  Through  Motion  Analysis,  IEEE  CV.PR  Con- 
ft  once  Proceeding*,  June  1983,  pp.  2*6-252. 


195 


[GIas83]  Glaser,  F.,  Reynolds,  G.  and  Anandan,  P..  Scene 
Matching  by  Hierarchical  Correlation,  IEEE  CVPR 
eonjtrtnct ,  June  1983,  pp.  432-441. 

[Hiid8vj  Hildreth,  E.  C.,  The  Measurement  of  Vina!  Mo¬ 
tion,  PhD  dittertatio a,  Dept,  of  Electrical  Engi¬ 
neering  and  Computer  Science,  MIT,  Cambridge, 
Ma.,  1983. 

[HornSl]  Horn,  B.  X.  P.  and  Schunck  B.  G.,  Determin¬ 
ing  Optica!  Flow,  Artificial  Intelligence  VoL  17, 
pp.  185-203 


[Nage83]  Nagel  B.  B.,  Disptocement  Vector*  Ditty  od  from  p  ^  * 

Second-Order  Intensity  Variations  in  Image  Sequences, 

Computer  Virion,  Graphic*,  end  Image  Processing,  Jy'\ 

21,  pp  85-117,  1983.  ..-.O 


[Nage84]  Nagel  B.  B.,  Constraints  for  the  Estimation  of 
Displacement  Vector  Fields  from  Image  Sequences, 
IJCAI-&3,  Karlsruhe,  W.  Germany,  pp  945-051, 
1983. 

[Ter*84]  Tcrsopoulos,  D.,  Mutiresolution  Computation  of 
Visible-  Surtax  representations,  PKd  Dueertatum, 
Massachusetts  Institute  of  Technology,  Jan.  1984. 

[Waxm84l  Waxman  A.  and  Wohn  K.,  Contour  Evalua¬ 
tion,  Neighborhood  Deformation  and  Clobal  Im¬ 
age  Flow:  Planar  Surfaces  in  Motion,  CS-TR-1SQ4 
University  of  Maryland,  April  1984. 


» 


r ' :'- 


\ 


n 

' . . . . . . GCATCNb)” 


On  Surface  Reconstruction  Using  Sparse  Depth  Data 


by 


Terrance  E.  Boult  and  John  R.  Kendert 


k _ 4 


Computer  Science  Department,  Columbia  University.  NYC,  NY  10027. 


r 

§0  Abstract 

We  report  on  our  investigation  into  the  problem  of 
reconstructing  a  surface  given  a  sparse  set  of  depth  samples. 
We  discuss  the  formulation  and  recent  history  of  the 
problem,  its  solution  in  abstract  terms,  identify  some 
heretofor  unmentioned  assumptions  in  the  formulation,  and 
discuss  possible  alternative  assumptions.  We  show  that  the 
space  of  algorithms  solving  the  problem  is  large,  and  that 
there  are  least  four  major  approaches,  each  of  which  is 
briefly  discussed.  Then  we  investigate  implementation 
details  of  two  of  these  approaches  based  on  the  use  of 
reproducing  kernel  splines.  We  also  discuss  our  future 
plans. 


§1  Introduction 

There  are  a  number  of  applications  in  computer  vision 
and  robotics  in  which  one  desires  to  knew  the  depth  of  an 
object  at  all  points  in  seme  particular  region,  but  the  only 
information  available  is  the  denth  (and  possibly  surface 
orientation)  on  a  sparse  set  of  points.  To  estimate  the  needed 
depth  values  we  often  attempt  to  reconstruct  all  a  part  of  the 
object  surface.  Researchers  have  used  a  number  of  different 
Tf  approximations  to  the  surface  and  an  ever,  greater  number  of 
representations  for  the  reconstructed  surface,  e.g.  sec  Allen 
( 1985),  Binford  (1981),  Mayhew  (1982),  Gnmson 
(1981),  Render,  Lee  *nd  Boult  (1985),  and  Terzopoulos 
«_( 1984).  The  '.a.,i  three  of  these  formulate  the  problem  in  a 
*  very  similar  fashion,  and  it  is  this  formulation  that  is 
discussed  in  this  paper.  This  formulation  defines  the 
problem  as  attempting  to  reconstruct  the  surface  that  the 

T  This  work  suoported  in  part  by:  NSF  grant  MCS-782- 
3673,  Darpa  grant  N0U039-84-C-O1 65,  an  IBM 
fellowship,  and  ai.  NSF  Presidential  Young  Investigators 
award. 


human  visual  system  would  reconstruct  if  it  were  given  the 
same  sparse  depth  data  (say,  in  a  random  dot  stereogram). 
This  formulates  the  problem  as  attempting  to  find  the  sur  - 
face,  from  a  class  of  surfaces,  that  minimizes  a  given 
unreasonableness  function  (usually  a  norm,  or  the  sum  of  a 
norm  and  some  other  penalty  function). 

Mart  ( 1 977)  noted  that  the  problem  of  finding  an  algor  - 
ithm  to  solve  any  problem  can  be  broken  up  into  four 
phases:  formulation,  solution  in  the  abstract,  analysis  of 
alternative  realizations  of  the  solution,  and  the 
implementaticn  and  testing  of  some  realization(s).  Each  of 
these  phases  is  important,  and  each  may  affect  the  others, 
but  they  da  provide  a  convenient  way  to  break  up  the  prtb  - 
lem  into  smaller  pieces.  We  shall  use  this  division  of  a 
problem  throughout  this  paper  In  Section  2  we  give  a 
precise  formulation  of  the  problems  ,  briefly  discuss  its 
solution  in  the  abstract.  Then  we  question  some 
assumptions  implicit  in  the  formulation  (used  in  Grimson 
(1981),  Terzopoulos  (1984)  and  Render,  Lee  and  Boult 
(1985)),  discuss  alternative  assumptions,  and  means  for 
choosirg  between  these  assumptions..  In  section  3,  we 
address  the  third  phase  of  problem  solution  by  discussing 
four  different  realizations  of  the  abstract  solution.  In  section 
\  we  tackle  the  fourth  and  final  phase  of  the  division  of  a 
problem,  by  analyzing  details  of  the  methods  of  .eproducing 
kernel  splines,  and  seeing  how  their  implementation  derails 
differ  in  various  imaging  situations.  Section  5  addresses  the 
future  directions  of  our  work. 

§2  Formulation  and  Solution. 

We  shall  examine  two  different  reconstruction  problems 
which  have,  i  nfortunately.  been  confused  into  one  p.r  blem 
in  the  literature  We  shall  refer  to  them  as  the  visual  surface 
(hereafter  VS)  interpolation  problem  and  the  VS 


197 


approximation  problem.  A  naive  formulation  of  the  VS 
interpolation  problem  would  be  to  find  “the  best 
approximation”  to  a  smooth  surface,  using  only  the  knowl  - 
edge  of  a  number  of  given  points,  and  requiring  the  surface 
to  be  interpolator  (i.e.  passing  through  all  the  given  data.) 
A  similar  formulation  of  the  VS  approximation  problem 
would  be  to  find  “the  best  approximation”  'o  a  smooth 
surface  using  only  the  knowledge  of  a  number  of  given 
points,  but  allowing  the  surface  to  be  chosen  freely 
otherwise.  Because  the  problems  are  so  similar,  much  cf 
our  discussion  shall  apply  to  both  problems. 

A  major  difficulty  with  these  formulations  13  they  are  not 
well  posed,  inasmuch  as  the  information  docs  not  uniquely 
determine  die  solution.  In  fact,  given  any  set  (of  zero 
measure)  of  points  on  a  surface,  there  art  infintely  many 
surfaces  interpolating  those  points  (and  of  course,  even  more 
approximating  them).  To  alleviate  this  problem  we  must 
somehow  restrict  die  class  of  allowed  surfaces,  and  give 
some  method  of  ranking  the  “plausibility”  of  a  surface. 

A  classical  way  of  insuring  that  a  problem  has  a  unique 
solution  is  to  use  ■  functional  on  the  surface  as  a  measure  of 
the  “unreasonableness”  of  the  surface, :  nd  restricting  the 
class  of  allowed  surfaces  fo  make  it  a  Hilbct  or  r  li-Hilbert 
space  with  the  unreasonableness  functional  as  norm  or !  mi- 
norm.  This  formulation  insures  that  there  exists  a  uut  .  re 
solution  to  the  problem  of  finding  a  surface  from  the  showed 
class  which  minimizes  the  functional  (arid  hence  is  the  most 
reasonable).  Throughout  this  paper  we  shall  assume  that 
this  type  of  formulation  is  appropriate  for  our  problems. 
Later  in  this  section  (see  §2.3  and  §2.4)  we  shall  examine 
some  issues  associated  with  the  choice  of  the  Hilbert  space 
and  the  unreasonableness  norm. 

§2.1  Formulation  of  the  VS  Interpolation 
Problem. 

Tor  J>f,  VS  interpolation  problem  we  choose  to  define 
‘  best  approximation”  in  terms  of  minimal  error.  We  assume 
that  error  can  be  measured  by  a  norm  with  respect  to  the 
given  class  of  functions.  The  norm  might  be  the  sup  norm 
(i.e.  the  maximal  difference  between  the  actual  surface  and 
the  approximation),  or  the  norm  (integral  of  the  square  of 
the  difference  at  each  point).  The  error  may  be  measured  in 


either  a  relative  (e.g.  error  of  5%)  Oi  an  absolute  sense  (e.g. 

the  surfaces  never  differ  by  more  than  .  1  ram)  depending  on 

the  goals  of  the  user.  Finally  the  error  may  be  measured  in  ■ 

the  worst  case,  or  on  the  average  (with  respect  to  some  * 

measure).  l 


Combining  all  of  our  assumptions,  a  precise  formulation 
of  the  problem  of  iK  visua'  surface  interpolation  problem 
becomes  : 

Let  Fj,  the  spice  of  allowed  surfaces,  be  a  Hilbert  o. 
i'.mi-Hilbert  space.  Let  F2  be  tne  elements  of  F 1 
restricted  to  a  finite  domain  Q  (since  we  are  only 
interested  in  recovering  a  portion  of  a  possibly 
infinite  surface).  Let  0(0:  Fi~»9t,  be  a  functional 
measuring  the  ‘’unreasonableness"  ot  a  surface  (i.e. 
the  more  reasonable  a  surface  f,  the  smalle,  0(0), 
where  f)  is  a  norm  cr  semi-norr;  on  Fj.  Low  let 

N(0  ={*l . *n}  2  {f(x  1  ,y  1 ) . f(xi-,yfc)}  be  the 

given  information  (i.e  the  input  is  k  depth  values, 

see  Kender,  Lee  and  Boult  (1985)  for  a  discussion  of 
allowed  information.)  Then  the  visual  surface 
interpolation  problem  is  to  find  f*  6  Fj,  such  that 
0(f)  =  «nin  9(g). 

ge  Fi 

Kender,  L«  and  Boult  (1985)  shew  (as  a  special  case 
of  work  on  information  based  complexity  (see  Traub  and 
Woz’niakowski  (1980),  or  Traub,  Wasilkowski  and 
Woz'niakowsfd  (1983))  that  giver,  the  above  formulation  the 
surface  minimizing  the  functional  0(0  (i.e.  the  most 
reasonable  surface)  will  also  be  the  minimal  error  surface 
with  respect  to  the  class  F 1  (i  e.  the  most  accurate  solution) 

for  almost  any  error  norm.  This  is  what  we  refer  to  as  the 
solution  in  the  abstract  All  we  need  to  do  is  to  actual',;  calc  - 
ulate  the  surface  of  minimal  norm  itself. 


§2.2  Formulating  the  VS  Approximation  Problem. 

For  the  VS  approximation  problem  we  choose  to  define 
“best  approximation”  in  terms  both  error  and  smoothness. 
We  assume  that  error  can  measured  by  the  sum,  over  all 
information  points,  of  the  distances  between  the 
approximating  surface  and  the  information  points.  For 
smoothness  we  shall  assume  that  this  is  measured  by  a 
“unreasonableness”  norm  0(f);  the  smootiier  ti  ?  surface  the 
smaller  0(0-  This  results  in  a  formulation  similar  to  the  one 
for  the  VS  interpolation  problem,  save  there  is  a  different 
functional  to  be  minimized.  The  formal  definition  is: 


198 


i « i  Ti ,  tin.  space  of  aaowea  surfaces,  be  a  Hi'bert  or 
semi-  -Iilbeii  space.  Let  F2  b®  *l'c  elements  of  Fi 
restricted  to  &  finite  domain  £2  (since  we  are  only 
interested  i '  recovering  a  portion  of  a  possibly 
infinite  surface,.  Let  <9(f):  Fj— *7.,  be  a  functional 
measuring  th-  unreasonableness”  of  a  •  face  (i.e. 
the  more  reast  nable  a  surface  f ,  the  smaller  9(f)), 
wrere  0  is  a  norm  or  semi-norm  on  rj.  Now  let 

N(f)  5vcj . zn}  a  { f(^  1  >y  1 )»- - -»  f(*lc ,)*;}  be  the 

given  Information  (i.e.  the  input  is  k  depth  values) 

Thtr  he  Asual  surface  approximation  problem  is  to 
fir,'  ,  •  e  F[,  sucl.  that  given  H  '*0,  f*  minimizes 
S(g)  with  respect  to  all  functions  g  €  Fj),  where 

k 

S(g)  a  p-0(£)2  +  Z  I  zi  -  S(*i.Yi)  I 


Unfortunately  this  formulation  does  not  currently  allow 
direct  application  of  the  theories  of  information  based 
complexity  to  make  statements  about  the  optimality  (in  terms 
of  minimal  error)  of  the  algorithm.  However,  as  with  the 
VS  interpolation  problem,  the  VS  approximation  problem 
has  net  reduced  to  the  problem  of  findinj  a  surface  from  a 
class  of  surfaces  that  minimize^  a  functional. 

Having  given  the  general  formulation  of  t'.e  problems, 
and  their  solution  in  the  absiiact  (as  functions  minimizing 
0(0  and  S(f»,  it  may  seem  fiat  we  have  completed  phases 
one  anu  two  of  the  solution  of  our  problem.  However, 
before  we  begi  i  phase  three,  ’.-e  should  cxarv.e  some  of  the 
assumptions,  implicit  and  explicit,  in  the  form.’ .ation  of  the 
prcble.  t 


§2.3  DNcussion  of  Plausible  Classes: 

How  Sm"o:h  Should  They  Be? 
Ir.  the  definition  of  our  problem  we  did  not  precisely 
define  two  important  parameters,  F  j-  the  class  of  allowable 

surfaces,  and  0(0  -  the  functional  meastiimg  the 
“unreasonableness”  of  a  surface  The  two  parameters  are 
tighli  coupled  (recall  that  ©(•)  if  ’  norm  or  semi-r.omi  over 
Fj )  and  so  must  bo  considered  sir'  Itaneously. 


Most  of  the  interesting  (class  norm)  pairs  (Fj  ,Q(->) 
dvr  derivatives  of  the  “surface”,  since  we  consider 
smoothness  a  part  of  reasonableness,  and  degree  of 
differentiability  can  be  considered  a  crude  measure  of 


smoothness.  So  far  in  the  problem  formulation,  we  have  no: 
explicitly  assumed  any  amount  of  differentiability;  we  must 
make  this  part  of  the  class  definition.  Note  that  if  the  scr/ace 
is  piecewise  n  times  differentiable  we  can  always  assume  the 
region  £2,  wnich  is  part  of  the  formulation,  is  such  that  the 
surface  is  smoo'h  (i.e.  it  has  differentiability  of  the  assumed 
order)  witnin  £2,  and  tiie  reconstruction  of  the  surface  can  be 
done  piecewise.  This  however  assumes  that  we  know  hew 
to  segment  the  surface  -  u  difficult,  and  generally  sdU  open 
problem. 


The  question  arises  as  to  the  number  of  derivatives  that 
should  be  assumed.  If  the  objects  to  be  viewed  are 
generated  in  a  particular  environment,  such  as  a  CAD/CAM 
manufacturing  environment,  we  can  often  say  with  great 
reliability  how  many  derivatives  the  surface  should  have. 
But  in  the  unconstrained  world  we  do  not  have  such 
amenities.  One  way  to  approximate  “reasonable"  differ  - 
entiability  it  to  determine  how  many  derivatives  the  human 
visual  system  can  detect  Although  it  is  unlikely  the  human 
visual  system  has  ;n  upper  bound  on  the  number  of 
derivatives  it  can  detect  (e.g.  it  can  tell  if  a  surface  has  ro 
more  than  2  continuous  derivatives),  it  may  have  a  lower 
bound  (e.g  the  surface  has  at  leas  t  3  continuous  derivatives.) 
If  we  ’relieve  that  the  human  visual  system  can  detect 
discontinuities  in  order  derivatives,  then  our  class  should 
have  at  least  n+1  (possibly  many  more)  continuous 
derivatives,  otherwise  human  observers  might  be  able  to 
detect  discontinuities  in  the  supposedly  smooth  surface. 
This  docs  r,ot  rule  out  having  explicit  discontinuities,  but 
rather  unintentional  ones  due  to  the  recons  or  :don  algorithm. 


To  help  answer  the  question  rf  how  many  derivatives  to 
assume,  we  have  proposed  a  psychology  experiment  to 
measure  the  detectability  of  discontinuities  in  the  derivatives 
of  one  dimensional  curves.  Preliminary  results  suggest  that 
wt  r.ced  only  consider  discontinuities  in  the  first  second  and 
third  derivatives  of  a  curve.  Note  that  these  results  do  not 
directly  apply  to  the  selection  of  a  class  of  allowable 
surfaces.  However,  the  number  of  variables  in  an 
experiment  to  measure  the  detectability  of  discontinuities  in 
two  dimension:  make  such  an  experiment  unweildiy.  Some 
of  the  questions  that  must  be  answered  before  such  a  2D 
experiment  could  be  run  include:  What  vi-wnaints  do  we 
assume?  What  lighting  conditions  L«»  we  arvume?  Do  wu 
texture  the  surface  and  if  so  how?  Do  we  shade  the  surface  ’ 


v-'. 


199 


If  sc,  do  we  smoothly  shade  surfaces  using  a  reflectance 
model  based  on  surface  normals  or  do  we  use  one  ,'tsed  on 
more  derivatives  of  the  surface?  What  method  of  rusplay 
will  allow  us  the  neeJet  accuracy  in  shading  to  i  .erectly 
implement  the  reflectance  model? 

It  seems  intuitive  to  assume  that  the  one  dmensional 
result  is  a  lower  bound  for  the  two  dimensional  problem:  if 
you  can  detect  discontinuities  in  the  second  derivative  in  a 
one  dimensional  curve,  then  you  can  detect  diem  in  a  two 
dimensional  surface  as  well  However,  this  inmane  be 
faulty  because  in  the  one  dimensional  experiment  we  have 
dense  disparity  data;  but  in  two  dimensions,  due  to  sparsity 
of  image  edges  relative  to  the  scene,  we  may  have  only 
sparse  depth  data,  although  we  may  also  have  shading  and 
texture  data).  Differences  in  the  available  information  con  - 
found  our  inmtion.  If  depth  or  disparity  data  is  moat 
important  for  detection  of  discontinuities,  then  die  2D 
problem  is  more  difficult:  humans  would  not  be  able  to 
detect  the  same  discontinuities  they  could  in  ID.  If,  oo  the 
otherhand,  texture  and  shading  aid  in  the  detection  of 
discontinuities,  then  the  2D  problem  may  be  fuser.  Other 
problems  Li  the  extension  of  the  one  dimensional  result  into 
two  dimensions  include:  Wist  it  the  analog  of  a  ID  point 
discontinuity  in  2D  -a  point  or  a  line?  If  it  is  a  line,  then 
how  docs  detectability  of  a  point  discontinuity  in  2D  relate  to 
the  detectability  of  discontinuities  in  ID?  Does  the 
“direction”  of  the  discontinuities  matter,  or  is  the  dr  stability 
isotropic?  To  what  ettent  do  illumination,  shading  and 
texture  add  or  subtract  from  the  ability  to  detect  the 
discontinuity?  Despite  all  the  problems  relating  the  one 
dimensional  result  to  the  two  dimeroioaal  problem,  they  do 
directly  applv  when  viewing  the  visible  edge  of  a  surface  or 
a  surface  with  ruled  markings  on  iL 

Although  we  do  not  have  solid  reasons  for  doing  so,  we 
shall  follow  our  intuition  and  use  the  one  dimensional  result 
as  a  lower  bound  for  the  problem  of  choosing  what  is  the 
appropriate  two  dimensional  class  for  the  VS  inter** anon  <* 
VS  approvin'.  4:  jn  problem. 


92.4  Discussion  of  Plausible  Classes: 

Things  to  Consider  and  Examples. 

Even  with  constraints  from  the  ID  experiment 
mentooed  above,  rt  are  still  faced  with  an  infinitude  of 
(class,  norm)  pain  from  which  to  choose.  If  dir  world  is 
sufficiently  restricted,  as  it  may  be  in  an  industrial  setting, 
men  the  choice  of  class  may  be  easily  made).  In  general,  to 
make  our  choice  we  might  appeal  to  physical  analogies  (e.g. 

O 

the  norm  should  measure  physical  beading  energy  of  an 
ideal  thin  ^  soul  buses  (e.g.  the  space  and  norm 

should  be  irvaricat  **  «-»i  defini  - 

uon  of  the  class  should  be  simple)  or  other  adhoc 
assumptions  (e.g.  tf£  (class,  norm)  should  be  such  that  the 
optimal  surface  exactly  reconstructs  tow  degree 
polynomials).  However,  moat  of  these  assumptions  still 
leave  infinitely  many  (class,  norm)  pain  satisfying  them.-they 
don’t  significantly  cut  down  .he  choices. 

How  then  do  we  choose?  We  propose  h>  decide  00  the 
basis  of  a  psychology  experiment  which  subjectively  ranks 
possible  classes.  Because  we  know  the  optimal  error 
algorithm  to  recover  surfaces  m  the  (class,  norm)  pain,  any 
perceived  difference  between  the  various  reconstructions 
must  be  due  to  differences  iu  the  model  assumptions.  Of 
course,  we  cannot  subjectively  rank  a  continuum  of  classes 
and  norma,  and  so  will  use  a  finite  subset  of  the  infinite 
space  of  (class,  norm)  pain  covering  a  wide  range  of  as  - 
sumptions.  It  is  quite  possible  that  we  shall  find  a  number 
of  the  classes  to  be  almost  indistinguishable  (or 
distinguishable,  but  none  clearly  a  best  fit).  If  this  is  the 
Mlc  then  we  can  make  our  choice  based  on  other 
considerations  such  as  the  computational  complexity,  some 
numerical  properties,  or  some  ad  hoc  assumption. 

In  the  formulations  of  Grimson  (1981),  and 
Terzopoulot  (1984).  there  was  no  assumption  of  a  particular 
class  of  surfaces,  just  the  assumption  of  a  particular  norm 
and  (he  differentiability  needed  for  this  norm.  In  Kende:, 
Lee  and  Eoult  (1983)  they  assume  a  particular  class  and 
norm.  Wt  ~ow  present  a  number  of  alternative  classes  and 
their  associated  norms  (all  of  these  will  be  used  in  our 
psychology  experiments).  For  each  class  «e  briefly  give 
some  of  the  (ad  hoc)  reasons  why  one  might  choose  this 
class.  Each  of  these  classes  could  be  used  in  the  formulation 
of  our  problems  and  each  would  offer  different,  at  least 
mathr.uticsily,  answers  to  those  problems. 


200 


First  we  present  notation  that  will  aid  in  'be  discussion 
of  the  classes  and  norms.  We  shall  let  x  refer  to  as  arbitrary 
class.  We  shall  use  the  notation  Dx*  to  represent  the 
differential  operator  d'(-)/dx',  and  DyJ  for  dl(  )/dyJ.  We 

drop  i  or  j  if  they  are  equal  to  1.  In  what  follows  the 
notation  Dxf(<v)  should  be  interpreted  as 


df 


da 


(*.y) 


(o,y) 


(similar  for  derivatives  with  respect  to  y  or  for  functions  of 
one  variable).  Also  DxDyf(a43)  should  be  interpreted  as  Dx 
applied  to  Dyf(x,f3)  and  evaluated  at  a.  Finally  we  nse  the 
sundart  notation,  Ilni,  to  denote  the  space  of  bivariate 
polynomials  with  degree  S  m  in  each  variable. 


$  2.4. 1  A  simple  family  of  classes  using  an  isotropic, 

physically  meaningful  semi-norm  defined  over  9?^. 

Duchon  ( 1 976)  called  this  infinite  fa-.niy  of  cLassec  ,D'm 
L~,  and  it  is  defined  as  the  space  of  functions  which  have  all 
partial  derivatives  of  order  m  in  L^C^).  Tien  givt-i  that  m 
2  2  we  have  that  Erm  /nm-l  **  *  Hilbert  space  with  the 

m^1  Sobelev  semi-norm  as  a  norm.  This  class  is  defined  to 
be  as  general  as  possible  while  allowing  the  use  of  the  nv;- 
Sobelev  semi-norm,  which  a  physical  interpretation.  This 
class  is  large,  and  therefore  the  optimal  algorithm  will  have 
larger  error  than  in  some  of  the  following,  smaller  classes. 
Furthermore,  because  the  class  is  defined  over  91*  we 
cannot  use  prior  knowledge  about  the  bound ry  of  the 
surfaces  in  the  class  to  reduce  our  orer. 


A  number  of  the  classes  we  ihall  examine  have  a  related 
semi-norm,  the  Sobolev  semi-norm,  ‘Fm(  ),  defined  as 

I 


¥  (0  . 

f 

I 

Jj 

•  \ 
„  J  D*  DJ  f  I2 

m 

i+j  -m 

< 

K21  *  y  1  J 

(Surfaces  minimizing  this  semi-norm  are  at  tunes  refered  to 
at  the  m**1  elastic  medium.)  For  m  -  I  this  has  the  physical 
interpretation  as  the  area  of  a  membrane  passing  through  the 
data;  for  m»2,  it  has  the  interpretation  as  the  amount  of 
bending  energy  in  a  thin  elastic  plate  passing  through  known 
points.  Hence  minimizing  1*2(0  is  equivalent  to  finding  a 

function  which  passes  through  the  data  point!  «nd  has 
minimum  bending  energy.  The  reader  familiar  with  the 
works  of  Grir.ison  (1981)  and  Terzopoulot  (1984)  will 
recognize  this  the  tame  functional  that  their  relaxation 
algorithm  ..tempts  to  minimize.  Grimsc;.  argue*  that  this 
fu.cbonal  is  minimized  by  the  human  visile1  system,  though 

he  does  not  present  any  psychophysical  basis  for  hit 
conclusions.  Note  his  arguments  only  consider  isotropic 
second  order  operators. 


before  we  begin  giving  examples.we  would  like  to 
point  out  that  there  are  other  classes  not  presented  here  that 
are  plausible  classes  for  the  problem,  and  shall  be 
considered  pan  of  the  psychology  experiment  Oit  intention 
is  to  indicate  that  even  on  purely  mathimaticaJ  grounds, 
many  classes  for  interpolating  or  approximating  surfaces 
rxist.  The  reader  interested  in  more  classes  should  see 
Boult  (1985c)  or  Boult  (1986)  for  more  details,  including  the 
reproducing  kernels  for  the  four  families  of  classes  we  shall 
now  consider. 


For  the  choice  m-2,  this  class  is  implicitly  considered  in 
Grimson(1981)  and  Terzouploa  (1984).  It  hrs  their  choice 
of  an  unreasonableness  functional,  and  require*  the  minimal 
assumptions  necessary  to  use  that  functional. 

j  2.4.2  A  family  of  classes  using  an  itolrtpic, 

physically  meaningful  semi-norm  defined  over  9(2 
with  a  somewhat  band  limited  function  space. 

This  family  of  class  was  fust  used  and  dismissed  in 
Duchon  (1976).  *7.;  family  is  defined  with  2  infinite 
parameter?  (one  uf  them  ernrin* wstf'  and  has  the  property 
that  we  also  assume  thn  iV-  ''trier  transform  of  all 
functions  in  the  space  is  i.-iited.  (Given  the  discrete  nature 
of  the  computer  vision  domain,  band  limiting  the  function 
space  teems  appropriate;  we  do  not  want  surfaces 
oscillating  wildly  in  between  our  discrete  samples). 

To  be  precise  we  define  D"mH*,  for  s  £  1  (s  real),  to  be 
the  space  of  fu, actions  which  have  all  partial  derivatives  of 
order  m  in  H*,  where  H*  is  the  Hilbert  space  of  functions 
with  tempered  distributions  v  such  that  the  Fourier 
transform,  y  ,  of  v  satisfies 

|tf‘  |y  (t)|*  dr  <  oo 

J  J  2 
9T 

Then  Duchon  shows  that  D"mil*  is  a  semi-Hilbert  space 
with  9V,(-)  is  tlie  associated  semi-norm.  These  classes  are 

only  slightly  smaller  (depending  on  s)  than  the  classes  D*m 


201 


1*2,  mentioned  abnve.  Again  because  the  class  is  defined 
over  912  we  cannot  use  prior  knowledge  about  the  boundry 
of  the  surfaces  in  the  class  to  reduce  our  error. 

§  2.4.3  A  class  using  an  isotropic, 

physically  meaningful  semi-norm 
defined  on  a  finite  disk. 

This  class  of  functions  uses  the  second  Sobelev  semi- 
norm  defined  by  (2.1)  form  -  2  and  has  at  its  domain  finite 
disk.  The  advantage  of  being  defined  on  »  finite  disk  is  that 
it  results  in  a  smaller  clast,  thus  a  possibly  smaller  error 

Let  s2  be  a  diik  with  radius  r  centered  at  the  origin  and 
let  H  be  the  set  of  twice  differentiable  continuous  functions  f 
£2  -*9?2,  such  that  the  first  and  second  order  partial 
derivatives  are  elements  of  L^iQ)Hefme: 


f  c  H:  9  “  9  x  f(*-y)  ” 

9  yf(x,y)  -  0 

Jan  Jaa 

JX1 

Then  Atteia  ( 1966)  shows  that  %  is  a  semi-Hilbert  space 
with  M,2<0  *  semi-norm  with  null  space  { 1  Ay}  (Le.  %l  Ill 
is  a  tiue  Hilbert  space  with  aa  a  norm).  One  can 
interpret  the  definition  of  the  class  ss  limiting  die  space  to 
functions  that  are  “balanced"  with  respect  to  the  boundrary. 
In  fact,  if  the  graph  of  a  function  on  boundary  of  the  disk 
were  considered  to  be  a  thin  wire,  this  class  consists  of  those 
functions  whose  center  of  gravity  for  the  wire  is  exactly  the 
orgin. 


§  2.4.3  A  class  using  an  non-isotropic  semi-norm, 
defined  on  an  arbitrary  rectangle  in  9^2, 
and  a  norm  incorporating  prior  knowledge  on 
a  finite  set  of  points. 

This  is  one  of  a  number  of  families  that  are  not 
necessarily  isotropic  (though  for  brevity,  we  do  not  consider 
the  others.).  If  there  is  some  physiological  or  psychological 
reason  (e.g.  the  horizontal  positioning  of  the  eyes)  thtt 
results  in  human  perception  of  smoothness  being  non¬ 
isotropic,  then  we  might  use  one  of  these  classes. 
Futhermore  the  norm  in  these  classes  can  be  tailo.ed  to 
reflect  explicit  knowledge  about  how  the  class  of  surfaces 
behaves  on  a  finite  set  of  points.  Let  Rra>n  be  the  Hilbert 
space  containing  functions  defined  on  Qa  [a,blx[c,d]  such 
that  for  all  f  u  R®*"  we  have  : 


Dx'f  is  continuous  for  i  »  1, ... ,  m-1; 

DyJf  is  continuous  for  j  »  1, ... ,  n-1; 

Dxm_  If  and  IV5' *f  (x,y)  are  absolutely  continuous; 
and  finally 

Dx®  f  and  Dyn  I  are  in  L2(  Q ). 

Let  {xj . xm}  a.*i  {y  \ . yn>  be  distinct  point*  in 

[a,b]  and  [c,d]  respectively.  Then  Arthur  (1974),  show* 
that  (f,f)1^2  j|  a  atmi-nortn  on  R®>n,  where  the  inner 
product  (f.g)  on  R®»“  is  given  by: 

m  n 

(f.«)  ■  £  £  >  ’  *<*iJf j  > 

i-1  j»l 


+ 


n 


I 

j-l 


fb 


I?  f(x.y j )  ■  glxo' j )  d* 


♦ 


m 


I 


•d 

D£f(Xj.y)  D5g(*i.y)  dy 
J  c 


+ 


f(xjr)  •  n?ta$!g<x.y)  <i* 


The  null  space  associated  wim  this  lemi-cann  is 
_nxn 

•  f  «  R  such  that  f(x,y)- 

m-1  n-1 

I  IPij'Vyj:  Pij-constant 

1-0  j  — 0 

which  is  of  finite  dimension  mix 


These  classes  arc  but  a  few  of  the  many  that  exist 
Others  have  y  .t  other  properties  (e.g.  the  norm  is  such  that 
the  result  of  minimizing  it  is  exactly  reconstructs  low  degree 
polynomials;  regions  of  different  smoothness  can  be 
incorporated  into  the  class  definition,  more  prior  knowledge 
can  go  into  the  definition  of  the  norm,  etc.)  Therefore, 
before  the  fiat  and  second  phase  of  algorithm  development 
is  complete,  a  rationale  for  the  choice  of  the  (class,  norm) 
pair  must  be  found  It  should  be  clear  that  the  choices  are 
many,  and  even  an  appeal  to  human  perception  is  ultimately 
a  statement  of  preference 


202 


§3  Analysis  of  Alternative  Realizations 

The  third  phase  in  the  solution  of  a  problem  is  the 
analysis  of  alternative  realizations  of  thz  solution.  There  are 
a  number  of  ways  that  one  can  attempt  to  find  a  surface  from 
a  Hilbert  space  that  minimizes  a  functional.  Of  these  we 
shall  examine  four  different  methods  and  discuss  their 
advantages  and  disadvantages.  These  methods  and  the 
subsections  in  whkh  they  are  discussed  are: 

$3.1  Discretization  of  the  problem  using  variational 
principles  and  then  discrete  minimization  using 
classical  minimization  techniques,  as  in  Grimson 
(1981); 


$3.2  Discretization  of  a  partial  differential  equation  for  - 
mulation  of  the  problem,  again  using  a  variational 
approach,  and  then  use  of  discrete  finite  element 
approximation  solved  with  a  multigrid  approach,  as 
in  Tcrzopoulos  (1984); 


$3.3  Direct  calculation  using  reproducing  kernel  splines 
for  the  Hilbert  space  including  the  null  space  of  the 
norm,  as  done  by  Duchon  (1976).  (We  shill  refer 
to  these  as  semi-reproducing  kernel  splines); 


$3.4  The  separation  of  the  Hilbert  space  and  the  null 
space  of  the  norm  with  direct  calculation  using  the 
reproducing  kernel  of  the  resulting  quotient  space, 
as  was  done  by  Meinguet  (1979).  (We  shall  refer 
to  these  as  quotient  reproducing  kernel  splines). 


It  is  beyond  the  scope  of  'his  investigation  to  discuss  all 
the  details  of  each  of  these  realizations;  we  shall  only  discuss 
the  relative  advantages  and  disadvantages.  For  a  more 
thorough  comparison  of  Grimson  (1981)  and  the  quotient 
reproducing  kernel  spline  approach,  see  Boult  (1983b). 


§3. 1  Discretization  and  classical  minimization. 

Grimson  (1981)  proposed  realizations  that  solve  discrete 
versions  of  the  VS  problems  (one  method  each  for  the  VS 
interpolation  and  the  VS  approximation  problems)  by 
employing  variants  on  classical  minimization  techniques. 
Some  advantages  of  his  methods  are: 

+  The  minimization  techniques  employed  are  reasonably 
well  studied  (e.g.  they  are  known  to  be  numerically 
stable  and  there  are  upper  bounds  on  the  rate  of 
convergence;  etc...); 


+  It  employs  "computational  modules”  that  are  rel»ti"sly 

simple; 

+  It  may  be  biologically  feasible  and  may  even  model  the 
human  visual  system; 

+  Almost  all  calculations  can  be  done  in  a  purely  local 
manner. 


However,  the  methods  also  have  the  following 
disadvantages: 

-  ilse  rate  of  convergence  fur  the  iterative  relaxation  used 
in  minimization  is  slow; 

-  A  good  stopping  criterion  for  the  iteration  is  lacking; 

-  The  use  of  a  grid  representation  limits  the  applicability 
of  me  method,  and  also  makes  it  dependent  on  the 
scale,  translation,  and  re  -tion  of  the  data.: 

-  There  is  difficulty  in  diriving  the  computational 
modules  if  the  boundary  of  the  object  is  irregular; 

-  There  is  difficulty  in  adapting  realization  to  other 
unreasonableness  norms; 

-  There  is  “slower  convergence”  away  from  information 
points  and  near  the  grid  boundary; 

-  Because  it  is  a  discrete  version  of  a  continuous 
problem,  there  may  not  exist  a  unique  solution  to  the 
discrete  problem. 

-  The  minimization  does  not  take  into  consideration  a 
particu'ar  space  of  smooth  functions  hence,  it  may 
arrive  ti  a  solution  that  is  not  even  from  the  appropriate 
functional  class. 

-  If  one  changes  the  measure  of  reasonableness,  the 
computational  modules  must  be  rederived  and  tire 
numerical  properties  of  the  minimization  must  be 
reestablished. 


$3.2  Discretization  and  multi-grid  minimization. 

Tcrzopoulos  (1984)  proposed  realizations  that  also  solve 
discrete  versions  of  the  problems.  For  a  particular 
reasonableness  measure,  he  relates  the  problem  to  a  physical 
problem  (thin  plate  interpolation)  and  uses  variational 
principles  and  finite  e'ement-like  methods  to  define  an 
algorithm  to  calculate  the  surface  of  minimal  energy  (minimal 
norm).  His  methods  employ  a  multi-level  grid  approach  to 
the  minimization  stage  of  the  problem,  greatly  increasing  the 
computational  efficiency  of  the  algorithm.  The  advantages 
of  Terzopoulos's  realizations  are: 

+  The  methods  are  far  mere  computationally  efficient 
then  Grimson ’s  methods. 


203 


+  The  methods  have  the  ability  to  measure  error 
differently  at  each  data  point; 

+  They  have  the  ability  to  deal  with  discontinuities  (given 
apriori)  in  the  surface  and  its  orientation, 

+  They  use  n  pyramid  representation  of  the  surface, 
which  is  convientent  in  some  vision  applications, 

+  All  the  computation  on  each  level  of  the  pyramid  is 
local; 

+  There  is  only  simple  communicadoo  between  levels 
(with  a  two  directional  flow). 


Some  of  the  disadvantages  of  Tereoupios'  methods  are: 

-  A  good  stopping  criterion  for  the  iteration  is  lacking; 

-  The  use  of  a  gnd  representation  limits  the  applicability 
of  the  method,  and  also  makes  it  dependent  on  the 
scale,  translation,  and  rotation  of  the  data.; 

-  There  it  difficulty  in  sdtpting  leaizition  to  other 
unreasonableness  norms; 

-  I,  may  be  difficulty  to  derive  the  computational 
modules  if  the  boundaiy  of  the  object  or  the  known 
dikoun tides  is  irregular. 

-  There  is  “slower  convergence"  away  from  information 
points  and  near  the  grid  boundary; 

-  The  numerical  stability  and  convergence  rates  for  the 
muld-gnd  approach  are  not  apparent; 

-  The  methods  strive  a  discrete  version  of  the  proble 
suffering  firm  the  same  problems  aa  Crimsons'. 


$3.3  Minimi  cation  with  Semi<Reproduclng  Kernel 
Splines 

The  realization  using  semi-reprod'icing  kernel  splines 
uses  the  mathematical  properties  of  the  kernels  to  exactly 
solve  the  continuous  problems  we  formulated  in  section  2. 
Semi-reproducing  kernel  splines  are  defined  in  terms  of  the 
reproducing  kerne1!  for  the  semi-Hilbert  space,  see  Duchon 
( 1 976),  Boult  ( 1 985a,  1 986),  or  Render,  Lee,  and  Boult 
(1985).  The  major  computational  component  of  the  method 
is  the  solution  of  a  dense  linear  system  of  equations  fsee 
section  4).  Advantages  of  this  approach  include; 

+  The  soludoo  of  the  linear  systems  is  a  well  understood 
topic; 

+  The  algorithm  results  in  a  functional  form  for  the 
surface  allowing  symbolic  calculations  (e.g. 
differendadon  or  integration); 

*  The  method  is  independent  of  the  shape  of  the  bound  • 
ary  of  the  object; 


+  There  is  no  problem  with  “slower  convergence”  away 
from  information  point* ,  or  near  the  object  boundaiy; 

<-  The  algorithm  can  efficiently  allow  updating  the 
information  (e.g.  adding  new  data  points,  removing  a 
point  or  changing  a  previous  information  value); 

-t-  No  item  don  is  needed  -  the  amount  of  computation  is 
fixed  a rtd  depends  only  on  the  position  and  number 
(not  value)  of  the  information; 

+  The  kernels  arc  independent  o' the  information; 

+  The  method  is  easily  adapted  to  other  spaces  of 
functions. 

+  Because  it  solves  the  continuous  problem,  it  it 
guaranteed  a  unique  solution. 

♦  If  the  norm  is  isotropic,  then  the  kernel  is  rotstiosvtliy, 
tnuislationally  and  teak  in  varies*. 

+  Fee  sparse  data  this  realization  is  more  efficient  that 
Griiuson't  or  Terzopoulos't  approach. 

♦  The  linear  system  is  symmetric;  if  the  data  falls  cm  a 
regnair  grid  (he  matrix  is  alock  toeplitz,  which  admits 
particularly  efficient  solutions; 

♦  The  kernels  are  simple,  relitive  to  the  kernels  of  the 
true  reproducing  kerne!  representation.  (This  is  not 
apparent  from  our  discussion,  see  Boult  (1985a)  for 
more  details); 

The  disadvantages  are; 

-  The  resulting  linear  system  is  dens*  and  indefinite 
which  limit*  the  approach  we  can  use  to  strive  it  (in 
fast  the  system  will  always  have  d  negative 
eigenvalues,  where  d  is  the  cardinality  of  the 

nuUspacc); 

-  Although  reproducing  kernels  exist  for  all  Hilbert 
spaces,  deriving  them  may  be  difficult.  (However 
they  are  known  for  a  large  number  of  interesting 
classes,  including  all  classes  presented  in  section  21; 

-  This  method  may  not  be  biologically  feaaaute,  due  to 
the  implicit  global  communication  demands.. 

$3.4  Minimisation  with  Quotient  Reproducing 
Kernel 

Splines 

A  fourth  realization  of  the  optimal  algorithm  uses 
quotient  reproducing  kernel  splines.  The  splines  are  defined 
in  terms  of  the  reproducing  kernels  for  the  quotient  space 
-the  original  space  with  the  null  space  of  the  norm  removed. 
Tb'sis  very  similar  to  the  realization  using  kernels  for  the 
whole  space,  except  that  the  kernels  ire  more  complicated, 
(see  Meinguet  (1979a.  1979b)  or  Boult  (1985a,  1985b, 
1986)  for  more  details).  With  respect  to  tne  other 


a 


♦ 


v 


* 


204 


•  U'U.TLV*  VA_"JLfc2J  ii.HiTUCrjr'J.T— !.-> 


realizations,  this  approach  has  all  the  advantages  and 
disadvantages  of  the  semi -repo  rducing  kernel  spliries.  The 
main  advantage  (over  the  semi-reporducing  kernel  splines) 
is: 

+  The  resulting  linear  system  is  poxlds  s  definite.  This  is 
a  important  property  from  the  numerical  analysis  point 
of  view,  insuring  tne  numerical  sta  vility  of  algorithms 

for  the  solution  of  the  system,.and  increasing  the 
number  of  algorithms  that  can  be  used  to  solve  it 

Tne  disadvantages  are: 

-  The  condition  number  of  the  system  appears  to  be 
significantly  larger  than  that  for  the  scmi-reproduct  ig 
kernel  splines; 

-  The  kernels  themselves  are  mucu  more  complex,  r« 
compared  to  the  kernel  functions  for  the  semi- 
reproducing  kernel  reprs  en union.  Therefore  the  time 
required  for  calculation  of  the  surface  at  e*~h  point  is 
greater. 

-  The  methods  must  explicitly  calculate  a  unisolvent  set 
of  data,  and  functions  interpolating  the'  unisolvent  set 
of  data;  in  general,  these  can  not  be  precomputed  (see 
H2). 


§4  Using  Reproducing  Kernels. 

Now  we  are  ready  to  explore  the  fourth  phase  of 
problem  solution:  the  implenentation  details.  In  what 
follows  we  first  examine  two  separate  reproducing  kernel 
based  methods  (described  in  {3.3  and  {3.4  above),  giving 
the  necessary  linear  systems  in  section*  {4.1  and  {4.2.  In 
{4.3  we  examine  six  different  imageing  situations  briefly, 
describing  which  of  the  two  method  and  which 
implenentation  specific  details  are  most  r  itabls  to  the 
situation. 

For  reproducing  kernel  based  methods  to  be  applicable  it 
is  sufficient  that  F{  (defined  in  section  2)  be  semi-Hilbert 
and  0(f)  the  associated  semi-norm  with  null  space  IT  m-  1  ° 
insure  uniqueness  of  the  solution  we  must  assume  that  Nfc 
contains  a  17m  unisolvent  subset  (i.e.  there  exists  a  set  1  (a 

subset  of  l...k)  of  indices,  and  a  set  of  information 
points!  Xj.yjJj  g  j  and  assorted  information  values  Zj  such 

that  there  exists  a  unique  pjfx.y)  «  ITm  **tiifying  the 
condition  Pj(*j.yj)  »  zj  for  all  j  in  J.  (For  example,  for  the 
case  presented  in  Kende.  lee,  and  Soult  (1985)  tsee  also 
{2.47),  this  amounts  to  having  four  nonccplanar  points.) 


§4.1  Semi-keproducing  Kernel  Spline*. 

Duchon,  extending  the  wort  of  Aaeia  (1976)  to  the  case 
of  semi-Hilbert  spaces,  noted  that  the  solution  to  the  problem 
of  finding  an  interpolating  function  (or  an  approximating 
function  as  defined  in  s"tion  2)  of  minimal  norm  in  the 
semi-Hilbert  setting  could  be  written  down  in  terms  of 
K((x,y);(s,t)),  tne  reproducing  kernel  of  Fj.  We  give  \e 

derivation  in  general  terms.  o 


Given  a  reproducing  kernel  K((x,y);  (s.t))  for  F  j  we  can 


write  the  interpolate,  y  spline  that  minimi  re*  9(f)  as 
k 

«Jj(x,y)  -  I  Oj  ■  K((x,y);  (xj.y;)) 
i-  1 


(4.1) 


d 

+  1  frqi(*.y) 

i-  1 


where  [qi)i,*(d  »  cardinality  of  the  set  J)  is  a  basis  for  IT  in- 
1,  the  null  space  of  6(f).  The  coefficient*  { Oj }  and  {p;)  of 
the  interpolating  spline  can  be  determined  from  tne  solution 
of  a  (k+d)  by  ilevd)  dr*se  linear  system. 


Recitihug  the  definition  of  N(f)  »  {zj ,  ...  z^}  ■ 
(f(xj.y  i)-..  ffxk.rk)}.  "here  (*j.yi)  the  location  of  the 
function  (depth)  values  we  car.  express  this  linear  system  as 

follows: 

k 

I  cq  K((xj,yj):  (xj.yj)) 
i  -  1 


(4.2) 

and 


d 

+  I  Pi  qii*j.yj)  -  *j.  j  -  >-•* 
!  -  1 

k 

X  Ctj  qj(xj,yj)  -  0,  j 

i  - 


Due  hoi  (1976)  shows  that  this  representation  yields  a 
O]  that  minimizes  the  functional  0(0)  and  is  unique  if  the 

set  Zj  -  ffxj.y,),  i  »  l....,k  contains  a  TTm-l  unisolvent 

subset.  This  then  solves  the  VS  interpolation  problem. 


I  04  •  K^x^;  xj.yi) 
i  el 


If  we  wish  to  solve  the  VS  approxi.r-abon  problem  we 
have  the  same  representation  (4. 1)  for  the  spline;  however, 
the  coefficients  {a;}  and  {J5j}  of  the  approximating 

( smoothing)  spline  can  be  determined  from  the  solution  of  a 
(k+d)  by  (k-Kl)  dense  linear  system  given  by: 

k 

T.  04‘K((*j.yp;  (*i.yi))  +  Ckitoj 
i-  1 

d 

(4  3)  +  qj(xj.yj)  -  xj,  j  -  U..JC 


I  04  qj(*i.yj)  -0.  j »  l,.  .4. 
i-  1 

where  qj  it  as  before,  Cj(  it  *  constant  dependent  on  the 
kernel  and  u  it  the  parameter  used  in  the  definition  of  the 
approximation  problem.  Fcr  a  more  complete  treatment, 
including  examples  of  systems  for  particular  kernels,  see 
Boult  (1983a)  or  Boult  (1986). 

f4.2  Quotient  Reproducing  Kernel  Splines. 

Another  realization  of  minimizing  a  functional  over  a 
Hilbert  space  is  due  to  Meinguet  We  use  the  fact  that  we 
can  separate  the  space  F 1  into  X<)  ©  ITm.  where  I7m  1*  the 
null  .pace  of  0(  ),  Xo«{g*  Fj;  g(xj.yj)  -  0,  V  j «  J) 

and  ©  is  a  (topological)  direct  turn.  (Recall  that  J  is  the  set 
of  indices  of  the  ITm  unisol  vent  subset  of  the  information.) 
Then  Xq  is  a  Hilbert  space  with  0()  as  a  norm  (not  a  cew- 
norm).  Then  given  the  reproducing  kernel  K\((s.u  x,y)  of 
Xq,  (which  can  be  expressed  in  terms  of  the  reproducing 
kernel  K((s,t);(x,y))  of  F{  and  the  functions  qj(x  y))  the 
interpolating  spline  is  given  by  : 

02(x,y)  -  £  C4  '  KM(Uj.yi);  (*.>)) 

i  <  J 

(4.4) 

>■  I  tj'  qj(xj.yj) 

J «  J 

where  the  size  of  I  is  equal  to  d.the  cardinality  of  the 
nulls, sace  and  the  coefficients  04  can  be  calculated  from  the 

(k-d)  by  (k-d)  dense  linear  system  given  by: 


(4i) 

Zj  -  I  Xj-qj(xj,yj)  V  k  C  J. 
je  I 

For  a  more  complete  treatment,  including  examples  of 
the  actual  systems  arising  in  practice,  see  Boult  (1983a)  or 
Boult  (1986).  For  a  thorough  comparisioo  of  this  method 
with  that  of  Grimson  (1981)  see  Boult  (1985b). 


Note  that  although  equations  (4.1)  and  (4.4)  seem 
different,  if  the  class  of  functions  and  the  norm  are  the  same, 
then  the  resulting  splines  are  exactly  the  same  function!  This 
follows  directly  from  the  uniqueness  of  the  function 
minimizing  the  functional  9(  )  in  the  class  Fj. 


$4.3  Examples  in  Different  Situations. 

In  this  section  we  shall  look  at  which  representation 
(4.1)  or  (4.4)  is  best  suited  to  imaging  situations.  The 
methods  lend  themselves  to  different  implementations 
depending  of  the  imaging  conditions  and  user  priorities.  We 
present  six  different  imaging  situtadons  and  examine  what 
implementation  details  we  might  use  to  solve  the  resulting 
linear  system  in  these  situations. 

§4.3.1  Data  on  a  Large  Fixed  Grid: 

Speed  Most  Important 

In  this  example  wr  assume  the  user  has  the  means  to 
gather  samples  of  the  surface  depth  on  a  regular  grid  pattern 
(e.g.  with  an  accurate  laser  rangrfindcrl.  if  the  user  is 
interested  in  pure  speed  then  the  representation  (4.1)  is  better 
suited  to  the  problem.  Then  the  linear  system  ((4.2)  or 
(4.3))  can  be  invert'd  (or  decomposes)  as  a  precomputation 
and  stored.  (This  costs  time  proportional  to  k3.  and  space 
proportional  to  k^).  When  the  data  comes  in  from  the 
rangefinder,  the  user  simply  forms  a  vector  dot  product  (lime 
n^)  to  recover  the  spline  coefficients  used  ji  (4. 1 ). 


54.3.2  Data  on  a  Large  FUed  Grid: 

Space  Limited. 

In  this  example  we  assume  'he  user  has  amples  of  the 
surface  depth  on  a  regular  grid  patten:  plus  a  few  extra 
.points  (again  maybe  from  an  accurate  laser  rangefinder).  We 
assume  the  user  nas  limited  memory  and  cannot  afford  to 
store  and  retrieve  the  inverse  of  a  k  by  k  linear  system.  In 
this  case  the  better  representation  is  (4.4);  we  need  the 
positive  definiteness  and  block  toeplitz  structure  of  a 
component  of  the  linear  system  (4.3).  We  can  then  use  a 
result  of  Lee  (1985)  that  uses  a  conjugate  gradient  iterative 
approximation  to  the  coefficients  in  time  a  most  0(n  2  log  n) 
and  space  CKn).  This  algorithm  takes  advantage  of  the  block 
toeplitz  structure  in  the  coir p  went  of  the  system  (using 
FFTs)  to  do  a  vector  matrix  dot  product  in  tine  0(n  log  n) 
instead  of  0(n2). 

§4.3  3  Stereo  Data  Gathered  in  Parallel: 

No  Update  in  the  Location  of  Data 
and  Accuracy  Most  Important 

In  this  example  we  assume  that  we  are  given  all  the 
depth  data  at  one  time,  and  we  will  not  wish  to  change  the 
location  of  any  of  the  data  (though  we  may  wish  to  change 
the  value  of  the  data  if  we  change  a  stereo  rri’tch).  Here  we 
are  interested  in  achieving  the  most  accurate  solution.  We 
would  use  (4.2)  as  our  linear  system  (it  nas  a  tubstantially 
lower  condition  number),  and  solve  i*  with  Gaussiin 
elimination  with  full  pivoting  (time  (1/3  n^j)  and  a  few  step# 
of  iterative  refinement  (cost  n2  per  step).  This  also  has  the 
advantage  that  the  kernels  are  simple,  and  so  is  the  actual 
program  to  solve  the  linear  system. 

§4.3.4  Stereo  Data  Gathered  in  Parallel: 

No  Update  in  the  I  ocation  of  Data, 
and  Speed  Most  Important. 

In  this  example  we  assume  that  we  are  given  all  the 
depth  data  at  one  time,  and  will  not  wish  to  change  the 
location  of  any  of  the  data  although  we  may  wish  to  change 
the  value  of  the  data,  say  if  we  change  a  stereo  match.  Here 
we  are  interested  in  achieving  the  fastest  overall  solution, 
and  the  choice  is  not  so  clear.  We  can  use  representation 
(4.4)  and  solve  the  system  ("sing  standard  Cholesky 
factorization)  in  time  1/6  n3  but  have  to  pay  a  larger  constant 
(actor  for  each  surface  point:  the  kernel  is  significantly  more 
expensive  to  evaluate.  Or  we  could  use  the  simpler  kernels 
and  factor  the  matrix  (see  below),  and  then  using  a  more 
complicated  algorithm  also  solve  the  system  in  1/6  n3  (with  a 


much  larger  constant  in  the  terms  of  order  n-).  To  do  the 
latter  we  would  factor  the  matrix  as  follows: 


A  3 


Q 

cT  w 

!2d  0 

^Q'1  »n-2d 


I 


where  Q  is  2d  by  2d 
and  W  is  (n-2d)  by  (n-2d). 


Q  0 
0  R 


i  o'c 

2d  v 


o  L 


n-2d 


where  R-W-dVc 

Then  we  can  use  Cholesky  factorization  on  R  (this  is 
because  all  the  negative  eigenvalues  are  in  Q,  so  R  must  be 
positive  definite  as  requin d  for  the  Cholesky  algorithm). 
This  takes  time  1/6  n3.  We  then  have  three  triangular 
systems  vhich  can  be  back-solved  to  arrive  at  the  answer. 
Since  the  size  of  Q  is  fixed  (it  is  2d,  where  d  is  the 
dimension  of  the  null  space  of  the  semi- norm  being  used), 
all  the  calculations  with  Q  and  Q‘*  can  be  done  in  constant 
time.  Note  that  the  factorization  does  increase  the  condition 
number  of  me  system  (4.2),  but  only  by  a  constant  factor. 
See  Boult  ( 1 986)  for  more  details. 


*4.3.5  Stereo  Data  Gathered  SequenliUly: 

Adding  Data  and  Removing  Last  Added  Point 
the  Only  Allowable  Updates, 
and  Speed  is  Most  Important. 

In  this  example  we  assume  that  we  are  given  a  subset  of 
the  data  ill  at  once,  and  we  may  wish  to  add  points  to  the 
information  and  at  other  time*  remove  the  last  point  we 
added  (although  we  can  change  the  value  of  any  data  point 
at  any  time).  Here  we  are  again  faced  with  the  same  choices 
as  in  (4.3.v).  In  fact  we  use  the  came  algorithms  with  the 
exception  that  we  make  provisions  to  add  and  delete  the  final 
row  and  column  from  the  Cholesky  decomposition  (and  in 
the  factorization  described  in  4.3.4  we  must  also  update  the 
matrix  C  and  C3").  The  resulting  algorithms  still  have  cost 
1/6  x3  bur  with  the  added  cost  of  0(k.2)  for  the  deletions. 

J4.j.6  Stereo  Data  Gathered  Sequentially: 

Adding  Data  a.id  Removing  Any  Data 
and  Accuracy  Most  Important. 

In  this  example  we  assume  that  we  are  given  a  subset  of 
the  data  all  at  once,  and  we  may  wish  to  add  points  tc  the 
information  and  remove  arbitrary  points;  also,  we  can 
change  the  value  of  any  data  point  at  any  time.  We  desire  to 


Mv 

f\v 


k:: 


I 


>"3 


yi 

I 


find  the  most  accurate  solution.  In  this  case  we  would  use 
representation  (4.1)  ami  solve  the  system  (4.3)  using  a  stable 
QR  decomposition  with  updating  (see  Daniel  et  al  (1976)). 
which  '•osts  about  3  fi  for  each  addition  or  deletion 
(assuming  there  are  j  points  in  the  current  rerresenutioo)  fix 
a  total  cost  of  about  (3/2)  k^  for  k  data  points  assuming  the 
number  f  deletions  is  a  constant. 


§5  Future  plans. 

Much  of  the  work  describe  in  this  paper  is  currently 
under  investigation.  In  particular,  the  psychology 
experiments  described  in  section  2  are  under  way,  and  shall 
be  reported  on  in  Boult  (1986).  As  part  i.*  the  experiments 
we  shall  implement  the  optima!  algorithms  to.  all  the 
different  classes  described  in  section  2.4,  plus  numerous 
other  classes. 

The  analysis  and  comparison  of  alternative  realizations 
will  continue.The  more  we  analyze,  the  moie  new  ways  of 
realizing  the  solution  and  implementing  use  realizations  we 
have  found.  In  addition,  new  advances  in  the  solution  of 
linear  system s  (either  in  algorithm  or  in  hardware  develop  • 
ment)  can  always  be  applied.  We  shall  eraraint  how  the 
class  choice  affects  both  the  numerical  accuracy  of  the 
solution  and  the  computational  complexity  of  the  resulting 
algorithms. 


§6  References. 

Allen  peter,  (1985):  Object  recognition  using  Vision  ar.l 
.i,  Ph.d.  dissertation.  University  of  Pennsylvania. 

uiur,  D.W.  (1974):  Multivariate  Spline  F-inctKv.il  1: 
Construction,  properties,  and  Computa*'on,  J 
Approximation  Theory,  #12,  p396-41 1. 

Atteia,  Marc.  (1966):  Etude  de  Certains  Noyaux  ct  Thdories  des 
Functions  «Spline»  en  Analyse  Numdriqud.  Doc  tear  Thise, 
De  Llnstitut  de  MatWmatiques  Appliqu6es  de  Grenoble. 
France. 

Boult,  Terrance,  (1935a):  Reoroducing  Kernels  for  Visual 
Surface  Interpolation,  Columbia  University  Computer  Sci  - 
ence  Department  Technical  Report. 


Boult,  Terrance,  >'l985b):  Visual  surface  Interpolation:  A 
Comparison  of  Two  Methods.,  to  appear  in  these 
proceedings.  Da  up  a  Image  understanding  Workshop, 
December  1985. 

Boult,  Terrance.  (1985c):  Smoothness  Assumptions  in  Human 
and  Machine  Vision:  Their  Implications  for  Optimal  Surface 
Interpolation,  Columbia  University,  Compute!  Science  De  - 
partmem  Technical  Report 

Bcult,  lerra.ee,  (1986):  Information  Based  Complexity: 
Applications  in  Nonlinear  equations  and  Computer  Vision. 
Doctoral  dissertation,  in  preparation. 

Daniel,  CW.,  W.B.  Gragg,  L.  Kaufman,  and  G.W.  Swart 
(1976):  Recrthugonalizatioj  and  Stable  Algorithms  for 
Updating  the  Gram-Schmidt  QR  Factorization,  Math. 
Comp ,  v28,  p772-795. 

Frank*,  R.  (1984):  Thin  Plate  Splinea  with  Tension,  to  appear 
CAGC. 

Grimson,  W.E.L.,  (1931):  From  Images  ro  Surfaces:  .4 
Computational  Study  of  the  Human  Early  Visual  System, 
MIT  Press,  Cambridge,  MA. 

Fender  John,  David  Lee  and  Terrance  Boult.  (1985,: 
Information  Based  Complexity  Applied  to  the  2  1/2  D  Sketch, 
Proceedings  of  the  Third  IEEE  Worlshoo  on  Computer 
Vision:  Representation  ami  Control,  p!57-l67. 

Mare,  David  (1977):  Artifical  Intelligence  -  A  oertonal  View, 
Artificai  Intehgenct  9. 

May '<  tew,  John,  (1932):  The  Interpretation  of  Setreo  Disparity 
Information:  The  computation  of  Surface  Orientation  and 
Depth,  Perception,  vol  ’.1,  p.387-403. 

Meinguet,  Jean,  (1979a):  Multivariate  Interpolation  at  Arbitrary 
Point*  Made  Simple,  Journal  of  Applied  Mathematics  and 
Physics  30,  292-304. 

Meinguet,  Jean,  (1979b):  Basic  Mathematical  Aspects  of  Surface 
Spline  Interpolation,  ISNM  45:  Numerische  Integration, 
21 1-220,  G.  Hfmircrilin  e&,  Basel:  Birihduser  Verlag. 

Terzopoulos.  Demetri,  (1984):  Multiresolution  Computation  of 
Visual  Surf  sc  i  Representation,  Ph  d.  thesis,  MIT. 

Traub,  J.T.  and  H.  Woriniakowiki,  (1980):  A  Ger^ral  Theory 
of  Optimal  Algoriihtrt,  Academic  Press  NY. 

Traub,  J.T.,  C.  Wasilkowsk\  aid  H.  Woz  .iiakowixi,  (19b?): 
Information,  Uncertainty  and  Complexity ,  Addison  Westey, 
MA. 


•„v 

*  * 

y 


£ 


i 


t? 


g 

■  • 

■s 


8* 


'7 

»** 


203 


1 


CCchgims  no 


LABELLING  LINE  DRAWINGS  OF  CURVED  OBJECTS 


L-- 


Jitendra  Malik 


i  ».  _ 

*V 


I  *' 


Computer  Science  Dcpr-tment 
Stanford  University,  Stanford,  California  94305 


ivv^ 

r  *  *  * 
‘  ^  • 


I 

4 

I 


Abstract 

l  tnl<  Ijnw  .1  /me  drawi/j-  is  /n/crnre/eW  .u  f/ir  projtTtioo 

ft  ;i  ••('<  of  t/irer  dmjrn>i.>j.\J  oh/erf.:  is  a  problem  of  funil:itncn- 
f.tj  in  if  'ortsiiur  iu  (  o/ii|iijf;iJion;J  Vi*joii.  T/im  rail  be  viewal 
.i  /um  step  process  (lx,  bihrlitruj  tin*  Hue  tlrAwiu*  .1  ml  (2)  /ufer* 
litif!  ••urf.u-e  cnnst:\unts  (mu ;  f/je  Inlnllni ;;  by  fur/hiT  gcorne/ric 

K.wiimijf, 

This  p.iper  describes  ah  .■dgnntJtm  for  thr  /,i lu  lling  problem 
line  drawings  of  op.u;ui%  curve 1/  objats  Innunlal  by  phrewise 
month  snrl  u  cs.  Emrli  bi.ngr  curve  is  rlnssitictl  am  the  projrc- 
r.*oii  0/  ,1  hmA  //if  /«m  ii«  .if  on  f/n*  snrhue  w/irrr  fbr  bin*  0/ 

■*b;/d  is  hingenfiaj  /«>  /be  sirrfwr  or  /a-  au  '<tjc  a  t.uifent  pL\ne 
•usciiut iituily.  Athht ionnlly  ear/i  iil/fr  is  el.issiti ed  am  emivex,  con- 
e-tve  or  nccttHiinf'.  The  scheme  is  m.v  betn.it  ic  Aliy  rifinrttttM  au< I  in 
« lunpbte  for  scan*  with  110  surface  tn.irkio^s  or  siiAtluW9. 


1  Introduction 


bine  drawing  interpretation  is  our  of  tin*  must  important  moil* 
n!«  in  visual  percept  ion.  A  Inn*  in  a  drawing  cau  correspond  in 
]"*  * 1 1 , *  Mi-nc  to  a  di  amii  innily  hi  depih.  Mitf.ue  ort<  illation.  >urhiro 

%  s-ellt .  lam  <•  or  iiliiinmal  mil  Well  however  d**al  willi  a  simplified 

model  <if  the  w« >1 1< I  w I M*re  llu*  objects  have  no  surface  marks  oil 
*'•  j  1 1 1  * f t •  and  wl«ere  I  in*  lines  due  lo  illnmm.t*  ion  limit  ii*m  like 

1*  >  1 1 .  it  I  ( >  a  n|-ss  and  spes  nlarihrs  has  been  removed  111  **omc*  prepro- 

t vs-me;  dep  The  i«M'al  -tm.  inri*  « >fl  I  lie  intensity  mu  hue  can  l»e 
i>eil  lo  di  tins  [JJ|.  ivi<  h  line  (image  cnive)  fan  then  ho  .  lass i* 
tied  -vs  <*it her  tin*  projection  of  a  /i*n/»  tin*  locus  of  jioin's  on  III? 
.surface  whor**  ihe  line  of  sight  is  tangent  lo  tin*  surface  or  as  an 
«  ••,hie  a  tangent  plane  .lisoontuiHlty.  Addi'  tonally  each  edge  can 
w  hi1  chisMlied  as  convex,  ci»ncave  or  oerbtding  <xlge.  I'or  <*  cliidir.g 
f  jylgej*  aim  linihs  we  would  like  to  infer  which  of  the  two  snrf-win 
honiiTmi*  the  curve  in  the  line  drawing  is  n*'ater  ill  the  scene* 
'I'iie^*  iijf«T<  i’.cc’S  cau  he  represented  hy  jdvini*  each  hue  one*  of  0 
possible  /fd'fi.i. 

*•“  I.  A  ’*-»  *  label  represent*  a  convex  i*d/;e  ail  ori«  ritatjon  di** 

J*  amt iienty  sin'll  *hat  the  I w.>  hue*  imc  tin^  aloipf  the  cdj'e 


in  the  scene  c'immim*  a  dihcnlr.J  ..t*^!e  "reali*r  than  ». 

2.  A  “ label  re  ) resents  a  concave  isl^e  ui  orient atiou  di.*- 
continnity  sue  I  that  the*  Lw«»  f.u  <*s  in«'cti':j*  aloii/*  tlir  rd^r 
in  tlir  M*ei»e  <1  lose*  a  diliedr.il  angle*  less  than  ir. 

3.  A  or  a  “  n*pn**4iits  aii  oc:  Indi.ig  ron\rx  islgcv  When 
viewisl  from  he*  camera  both  the  snrf.ice  jKdches  width 
meet  along  li  e  olgc*  lie  on  the  *amr  side*,  one  ocrhldiilg  thcr 
other.  As  on*  moves*  m  the  dirtytnm  of  the*  .urow,  thene 
snrf.ic ea  «  re  to  he  right. 

A  1  .*  a  *■  n*prc*seiii  *.  .1  liinh  Here  the  Mir  hue 

sniiMithly  curves  anmnd  tc  occhnle  itsc’If.  A*  am* moves  in 
the  dirty  ti  m  of  •  in*  twin  arrows,  t  he  snrf.ice  hes  to  Ihe  right . 

Of  the  G"  CMiuhniatorially  p<*s>»hl.  label  ;issignuienta  to  the  n 
line**  in  a  drawing  only  a  small  tinri  h  r  are  physically  poisiMi*. 
The  detrrmm.it ion  1  if  lhc*s4-  is  the*  the  hilnUir.y  problem  and  i*  the* 
primary  Miiijirt  of  tins  papcT. 

2  Ilovicw  of  past  work 

The  lir»l  >r 4 »'« c^.-  fill  alt.  nipt  I « >  solve*  tins  problem  w.is  made*  by 
linllmaii  (7|  and  t  *loi%es  J.r»j  m  l“7|  I'ln  y  ex ively  cat. do  p'd 
tin-  \rrtjri**  that  could  ariM*  in  liar  •  haw  ings  of  lrdie«|r;\|  objects 
{oli|.t  *.  *.  wliin.1  corne  r*  are*  forilieel  I  y  e  x.k»  1 1 1 r* *<*  Oies'litlg  hu  es) 
ami  liu. ■  *iM-d  tin*  ealalog  lo  mte-iprel  lines  is  corii**poii<lmg  t<» 
convex,  •  •tiicavc*  or  convex  es  c  Ineln  g  eelges.  The  catalog  give* 
tin*  t ».»>>; I ile  hihe'lliiigs  at  r.u  li  junc  tion  ami  global  consistency  i* 
fon  ee|  oy  the  rule  that  i.u  li  line  in  tin  drawing  he  .'j*sign»*d  one* 
and  or.ly  one*  label  along  >l »  lengilj  Wall/.  j‘J0j  proposed  an  algo* 
ri -  !i  111  for  llns  prohh-m  (.utimlly  for  an  aiigine*ntee|  version  with 
shatiows.  1  rack*,  and  e  parahly  concave*  *y|gi  s)  which  reduced  Ihe 
M'.tn  h  by  a  lilpTing  >lep  tn  wtm  ti  .uij.it  iMit  pan  *  0/  jnnclions  are 
ex.immed  .:lld  iim  oeioal  ible  <  .nidnlah  l.ih.  Iling"  ejiscareleel.  M.u  k- 
Wen  t  h  ,  l  ’J[  developed  I  lie  1  oiieept  of  (  .  r.uhei  tl  Spae  e  will*  li  enabled 
ln>  piogr.im  to  label  line  d<  .wings  of  irbihary  polyhedral  sci’lie*. 
One  . . .  of  lltk  attempt  to  deal  wtlh  an  arbitrary  num¬ 

ber  of  plan***  inciting  at  a  vertex  w.is  a  1  ciminnaloi lal  explosion 


‘--.V 

V 


o  r,- 


I 


209 

v  < 


l 


3  Criteria  for  evaluating  a  labelling  scheme 


in  f  lit*  mi iiiImt  of  labellings  generated  which  correspond  to  highly 
counter  ini uitiv**  interpretations.  Draper  •  points  out  that  then* 
art*  33  legal  labelling"!  Od  if  accidental  viewpoint  is  flowed)  for 
*  Im*  lint'  drawing  of  a  tetrahedron.  In  I  in*  context  of  [hr  Origami 
Wctr’i]  K.made  jS’  hud  t*»  faro  a  similar  problem. 

I  >r  ohjt'cls  Im  uded  hy  curved  surfaces,  then*  have  b*vn  two 
major  efforts.  The  i . r.st  w;t>  hy  Turner  ;lOj  who  u.wd  a  heuristic 
pr*w  *t|*«ro  tailed  the  I'C  (poly ludral  fi»  curved)  Iran  "format  ion 
to  obtain  t in*  labelling  p<»Mhi!itit‘s  for  junction.'*.  Tumor's  ap- 
prmuh  sulfeml  from  several  baste  weaknesses: 

•  Unlike  Hit*  Ilnff man*  C*ow<*s  procedure  which  was  obviously 
roiuploto  f* >r  the  class  being  considered,  such  a  umviming 
claim  cannot  In*  mailt*  for  Turner's  promliirp. 

•  Timer's  procedure  is  limiti*il  to  object*  Midi  that  rarli  face 
is  only  one  type  of  stir  fan*-  planar,  parabolic  hut  non  plnn.ir. 
elliptic  or  hyperbolic,  A  "iniple  object  hko  a  torus  which  is 
hound  hy  a  single  smooth  *urf.ue  which  ii;i*  holli  elliptic 
and  liyporholic  patches  tail  not  he  h. indit'd. 

•  A  major  problem  is  ill*’  huge  nuniher  <4  junction  labels  .uul 
tiie  coiiMspient  exp*osion  in  the  number  of  legal  hit*  rprefAr 
lion*.  See  Table  1.  'This  is  to  he  contrasted  with  the  small 
M/.e  of  l  he  IIidfiiKUi  Clowes  catalog. 

The  next  major  elforl  w.is  hy  Shapira*  Knvmaii-Cliakravarty 
[|7|.  jl|.  Tliey  considered  ohjnts  sneit  that  exactly  three  f.wes 
in**et  at  a  v<riex  where  c.ich  hu  e  is  fit  her  a  tjii.uhic  MitT.uenr  a 
plane.  Their  junction  catalog  is  much  smaller  fhaii  ■  lr.it  *»f  Tnrucr 
winch  makes  >i  pr.u  tically  usanle  in  ceri.eu  situations,  However 
some  fiiiu lament  .U  weaknesMH  remain: 

1  No  arguments  ".re  given  to  prove  the  validity  of  liu  join  tion 
catalog.  One  is  left  with  the  suspicion  that  it  was  derived' 
hy  ub-vrval  inti  of  junctions  in  .sms*  fypic.d  c»«rv*“i  nhji'ct*. 

2  The  scheme  is  luoiltsl  to  objt'cts  hound  si  by  tpiadrit  *uif« 
f.u <*s/plau*'s  .uul  exactly  three  face*  nn-e  nig  along  a  vertex. 

3.  For  a  uoti-oc<  hiding  edge,  no  dislinctiou  is  mad**  between 
convex  ami  concave  **dge*. 


Corner 

Lihi'ls 

p. 

Ij'P 

p. 

GO  2 

<•. 

ir. 

31(4 

C,P, 

138 

<-ll*. 

1003 

1^1 

8 

E. 

205 

IMP. 

138 

E,r7 

M3  j 

Table  I  Numher  of  labellings  for  (Mi  ll  of  Turner’s  mrnrr  c|msm*s 

Lee.  liaralick  am)  Zhang  jll|  extend  this  cefalog  hy  adding 
line  label*  Ikim*!  on  i  [nlfmaief 'lowes  rules.  The  juxitieatioii  for 
t he  validity  of  this  step  is  not  given.  While  tins  p.uti.dly  >olve* 
piohlriu  3  nieiit  it  lin'd  above  the  lirst  two  woakue"scs  remain.  For 
a  detailed  di>cn.*siou  we  would  refer  the  rentier  to  ||!ij  where  w** 
a  I  mi  point  out  Mini*’  mistakes  j*j  these  catalog*. 


WV  list  her*'  sotue  **e|f-evi«.enl ’  criteria  which  one  wouhl  like  a 
*«  jirine  for  labelling  rrrved  object*  to  satisfy. 

1.  Jl  should  he  able  Ic  handle  a  broad  range  ol  objects  n  fa-  es 
misting  at  a  vertex  %’dh  ivuli  hue  a  portion  of  a  general 
surface.  A  restriction  1 «»  plam*s  or  <pi  uiric  surfaces  is  too 
limiting. 

2.  The  scheme  should  hr  derivixl  in  a  mathematically  rigorous 
way.  It  should  In*  possible  to  shale  the  precise  conditions 
uinhr  which  lli*'  scheme  is  valid. 

3.  Tin*  algorithm  should  not  gnnraU*  t«s»  many  labellings. 
Frrfiralily  they  should  correspond  U>  tin*  interpretations 
found  hy  Imuiaji  *>l>servers. 

The  lai  ‘Mings  generated  should  he  physically  realisable. 

5  Labellings  dionhl  he  found  without  t*H>  much  search. 

G.  Til**  si  heme  shoidd  be  robust  w.ili  respect  to  errors  in  the 
input  line  drawing. 

In  Section  11.  we  wll  evaluate  our  .«lieme  for  its  success  (or 
failure  ’)  in  nuvfiug  th:*se  criteria. 

4  Modelling  the  Scene  amt  the  Projection 

The  seem*  rmisist*  of  a  set  of  ohjrt  L  m  t|ir*s*  dimcihiomi)  spar;. 
An  object  e  .kCouiKctol,  houmh'd  am!  n'gular  subset  of  ll3  whose* 
I m  Sir  alary  is  a  f’1  pirrewi***  hhooIIi  surface.  Ily  rryulnr  we  menu 
that  it  is  tin*  closure  of  its  interior.  Tins  disallows  ohjtrt*  with 
"dangling'  lares  or  edges,  ImpoMiig  this  coiulitioii  is  a  stajn bird 
prartirr  in  solid  modelling.  Tin-  formal  definition  of  pirrtvnne 
.imom/i  .nnrfnrr  is  somewhat  coiuplicattsl  .and  I  lie*  inter*  •sU'*!  reader 
is  referred  m  |  IT»|  for  the  details,  lhasi*  .d!y  a  pitvewise  snuHith  snr* 
f.we  consists  <if  | mr lions  «  f  *u»*M»lh  snrfm «*s  joined  together.  The 
surface  «>f  a  p*»lyli4slr«»ii  or  a  finite  cylinder  art*  simple  example* 
If  two  r*  surf  .ires  F(r.V.z)  (I  .iimI  c)  ■  It  mt*Ts*s*t  at  a 

point  /*  where  the  surfaces  have  distinct  t.jigenf  planes,  lh«*u  it 
can  Im*  "how  n  that  the  p.irt  *  »f  I  lit*  intiTs*s*i*Mi  of  the  two  stir  Lari'* 
near  P  is  a  smooth  (.’*  arc.  Such  «an  art  tut  a  piecewiiu*  suuHith 
surf,  ire  is  *all«sl  an  c*f«yc .  A  point  of  infers*  si  ion  of  thrts*  *»r  more 


Figure*  2:  Not  a  p;e*<’whw*  smooth  surface 


<-e|gos  is  called  a  vertex.  Each  maximal  connected  simmlli  portion 
of  the*  surface  is  ralle*d  a  fa  re.  Tlu**e  definitions  miner  to  the 
*}ajidard  definitions  4>f  Au'cs,  edgen  mid  vrrtire>s  for  polyhcdra. 
Example.  A  right  circular  cylinder  is  a  f’®  piecewise  sniemtli 
yiirfacf  with  3  fares.  2  e*dges  ami  0  vertices. 

Example.  A  sphere  or  a  torus  is  a  piecewise  siimoth  surface  with 
i  fare.  0  tsh»es.  and  0  vr: tiros. 

VVV  would  like  t<»  point  out  that  e.ur  dWinition  of  piecewise 
mi  moth  surfaces  dilTers  somewhat  from  some  oilier  definitions, 
which  «l«»  not  require  that  the  tangent  pl;uu,s  he*  distinct  across 
an  erlgt*.  An  example  of  a  surface  which  is  not  piivowise  smoedh 
under  our  ddinition  is  shown  in  Figure  2.  The*  cone,  if  its  apex  is 
included  is  also  not  a  piecewise  smooth  surf.ue. 

Two  kinds  of  mappings  may  lu*  rolistdcn'd  orthographic  ami 
perspective.  In  Md*  paper  we'll  limit  ourselves  to  orthographic 
projection  which  rorrcsjmiids  to  the  eye/camera  being  elb'e  lively 
d  inluiite  distance  from  the  srciic.  If  the  viewing  dinvtion  u 
along  the  z  a<is,  a  point  (x.y.j)  ill  the  srrm*  projects  to  (he  point 
(y,  i/)  in  tJie  image  plane  and  is  visible  if  there  is  no  other  point 
(/  »/,  jf)  belonging  to  any  uhjevt  in  the  seem*  will  z1  <  z.  Tin* 
piojis  tioii  of  the  scene  is  (In*  projirlion  of  (he  visible  points  in 
the  s'  cne. 

The  viewpoint  is  assume  *d  te>  he  general  llier1  exists  an 
open  n  igh’norhmid  of  the  vantage  point  in  which  tin?  ‘topological’ 
structure  of  the  line  drawing  remains  unaltered,  This  wiil  he  made 

more  precise  in  the  next  section. 

5  Tho  Line  Drawing 

The  only  lines  we  will  he  concerned  wilh  arc'  the  projei iious  of 
^eYpth  and  orieiu.il ion  discontinuities  hi  the  scene.  Tin*  curve's 
in  I  lit*  liiM'  drawing  are  segmented  .if  tangent  /curvature  discon¬ 
tinuities.  The  points  where  their  are  tangent /curvature  discon¬ 
tinuities  an'  referred  *•»  as  junctions.  Figure  3  is  a  sample*  line 
draw mg.  Endings  of  image  curves  e  g  are*  aho  referred  to  ns 
unctions  .  s  arc’  jmints  where  two  or  more  image  curve  segments 
Ju’et  cy  /j,  /*.  j0. 

From  flie  line  drawing  an  image'  sirueture  graph  may  be  enli¬ 
st  meted.  It  is  an  undirected  0raph.  *ts  node’s  an*  all  the*  js.nrtions 
in  It-  •  line'  drawing  and  additionally  pseudo  junctions  like  ja  one 
nr  each  isolated  srioofu  closed  curve  like1  fj.  Eaeh  image  curve 
M'gmnil  e  o:  respomls  to  an  are  between  Mu*  vi  it  ices  coriespondiug 
to  I  lie  juui  I  ion/s  on  which  it  is  incident.  fi*r  example*  (lie  node 


corresponding  ft?  ]\  has  arcs  corresponding  to  \  tiu*  first 

two  to  tlic  node  c'nrrc'po.'ding  to  jrj  and  the  third  In  the*  ne»dt» 
:,e>rrespe>neliiig  to  j.>.  The  mule  <  e»m*spoi.ding  to  x  h  is  thre'e*  arcs. 

In*  ■'lie  s  e  onvsponding  to  tiave  only  eine*  are’  each,  As  is  ed)- 
vioe.s.  in  ge*ne*ral  the*  Image  Structure*  Crnph  (henceToi  I  h  I  lie  IS(I) 
may  be*  di-«  tmuet  teel.  have  nu»re  than  one  air  t>e*twew,*h  lwe>  liodes 
and  leave*  m  !f  loops. 

I)j/  « ein-ifjer 'ing  I  lie  obscrveel  geo  met  ric  proper!  ie*s  in  tin*  line* 
elmwiitg  the*se  jniu  tiems  can  be  t  hissilieel  ;vs  follows: 

Termination:  (  urve  <*uds  there  evg.jj. 

L:  Tangent  discontinuity  across  juuetim. 

Curvature  L:  T.uigcnts  coat imams,  Curvat .lire  discontinuity  e.g./j 

T  Junrtioii:  2  of  3  image  cure's  at.  the  jiiuetieui  have  same  siop«\ 

*  urvaturc  e  gja. 

Pseudo:  ('e»rre*spomls  to  isolated  closeel  sme>oLh  curve’s  e.g.jg. 

Throe- Tangent:  3  curves  wilh  coiuinem  tangent.  2  have  same 
curvature  c.g.74. 

Arre»w:  3  curves  with  distinct  tangents.  Oik*  angle*  >  ir  evg.j^. 

Y:  3  e*ui v<*s  with  disliuct  fangeai's.  No  angle*  >  x  evg.jj. 

Multi:  I  or  nn»re*  image’  curve's  at  the*  juue  l  ion. 

I  lie  < St  t  is  aiigineiit ed  by  Monug  at  each  node*  ,m  atbilmte  field 
ciiiTes|Meidmg  le>  which  of  (he*  above-  classes  the*  jmieTioii  he *h mes. 

We*  ran  j*:nv  deliiir  pnx  iM'ly  what  we  mean  by  ge.»:*r;il  view- 
peunt.  The  anciiK'iiIe’e}  IS(N  eorrespniieijiig  to  the  line  drawings  of 
the*  scene  when  vie*we*el  from  e 1 1 ffr  re'i »t  points  of  a  silelicieurly  small 
o[»«*n  neigliheirnetetd  (a  l».d!)  of  the*  ventage  point  .ire  i.oumrpliic. 


figure*  3:  A  Sample*  Line*  Drawing 


6  Projection  of  curved  objects 

Wo  wish  to  stud\  !mw  curved  objects  project  to  a  lino  drawing. 
This  is  dono  by  studying  how  neighborhoods  of  dilToroiit  kinds  of 
points  oil  tin*  surface  projrci  and  cataloging  tlio  ;osnl! big  junc¬ 
tions  hi  Iht*  lino  drawing.  This  ran  In*  broken  up  into  three  cases. 

•  Tin*  projection  «>f  a  neighborhood  of  an  interior  point  of  a 
r—  'o' 

•  Tlio  projertion  of  a  neighborhood  of  an  iutorior  point  of  ap 
<*dgo. 

•  TIm*  projection  of  a  neighborhood  of  a  vortex. 

whioli  aro  tjukhm  resp«*c lively  in  S  *ct'oiis  7,  8  and  0. 

As  wo  aro  dealing  with  tin*  proji *<*:. ion  of  r.jmunc  surface*,  we 
.also  have  to  worry  about  «lu*  phenomenon  of  ore  lu  si  on  obstruction 
of  t  he  view  of  ll»o  surf. wo  of  an  ohjori  by  am  tlior  object  (  »r  an¬ 
other  part  of  i.he  siiim*  object).  Tbit  gives  rise  to  T  jnwtious. 
Hero  wo  know  that  tbt*  top  of  tin*  T  junction  corresponds  to  a 
nearer  surface  occluding  ano'bor  «*bji*c(.  S«*o  Figure  *1.  Note  that 
there  is  no  constraint,  on  the  label  of  the  stem  of  the  T  junction. 


Figure  4:  Labels  for  a  Y  Junction 


7  Projection  of  a  face  neighborhood 

Till*  nnTi’>|)oinls  to  Mir  pri'jivtiini  nf  a  .ingle  C3  Mirfacc  element, 
it  is  an  instance  oi  the  class  of  (napping**  00111  two  dimensioned 
Miaiiifobb  to  two  dimensional  manifolds.  Whitney  in  1955  studied 
the  singularities  of  such  mappings  and  showed  that  genrrically 
t  here  are  only  two  singularities  llie  fold  ami  the  m.sp.  This  result 
discussed  in  Sivtion  7.1.  In  H*clion  7.2  we  sJudy  Whitney's 
theorem  in  the  tout  ext  of  the  projection  mapping.  Limbs  are 
•1  -soon ted  with  tin*  fold  singuiarib  and  terminal jons(Scrtioil  *1) 

■  ••e  associated  with  cusps.  At  a  termination.  we  .  in  deb'nnsiio 
M'hich  ol  the  two  Mir  hue  patelies  is  nearer. 


7.1  YVhitnoy's  Singularity  Theory 

In  1955.  Whitney  j21]  puldi.*hed  a  landmark  paper  on  the  sin- 
gnianries  of  mapping  of  open  sets  in  inb*  If1.  liy-i.nples  of 
sueli  mappings  are  projection  (orthograp1  ic  and  pers'e**  ‘  ive)  of 
suri.ues  and  the  (iauss  map..  Whitney  showed  that  g<  uenrally 
there  are  oiuy  tw.»  kinds  of  singularities:  /n/fb  lying  along  eurves 
and  isolated  rasp  pomls  lying  oil  l  lie  folds.  We  will  explain  what 
this  means  in  the  context  of  piojeet  ioli  in  the  next  sect  it  u.  In 
is  seel  ion  we’ll  deline  various  •ertiis  and  try  f  )  give  a  feel  for 
Win!  ney  s  results. 

Such  a  mapping  is  detiiied  by  the  I  wo  functions  a*  . :  /(tfi.x?) 
and  11 2  ( x  1 . .r 2 )  where  (ri.j:*)  and  (ii|.n_.)  .ire  the  coordinate* 
in  the  two  spaces.  The  mapping  is  said  to  be  ( 'k  if  Hie  rune  lions 
/  and  //  have  eontinuous  partial  derivatives  of  order  <  k.  Lot  J 
Ik*  the  Jacobian  of  tin*  mapping.  A  point  p  is  said  to  1m  a  regular 
or  singular  point  according  an  ./(/#)  £  ()  or  ./(/>)  =  0.  We  arc 
iui.  rcd**d  in  studying  the  1<h  u.s  of  the  singularities. 

Example  1.  Consider  the  mapping 

“  -tl2.'*.’  - 

Tin*  Jacobian  ./  ^  2/|  -  (I  — jj  0  The  straight  line  i.\  —  i) 
is  the  locus  of  points  where  the  mapping  is  singular.  This  is  an 
example  of  a  fold. 

Example  2.  Consider  the  mapping 

U,  =  H3  -  X,£},  U}  =  .1, 


Pigr.rii  5:  Two  canonical  pxainplc? 


212 


v 

e 


The  Jacobian  ./  =  3xi*  -  xj  -  0  ==>  x?  =  3n*.  This  rorrrspomis 
in  two  foil!  curve..  oik*  for  positive  Xj  -.vl  oik*  for  negative  Xj -  - 
both  in  tin*  Ii.-a.lf  I  linin'  wln-ri*  x,  >  (I.  Tin-  two  fold  curve*  mint  at 
'In-  origin  ot  cusp  jioinl.  This  singularity  nrrim  wlinirwr  two 
I'otil  curves  roioo  together  ami  ilisappcar. 

Our  choice  of  examples  was  not  ;ieei<li*iital.  Afli*r  suitable 
inordinate  transforiiialions  .ill  folds  ami  cusps  mil  in*  describi-il 
l.y  llii-  ranonii  .il  forms  in  f  •  an Ij .!. -s  i  and  2  rospis' 1 1 vi'ly.  Wlhfliey 
sliowixl  that  every  singularity  of  a  sumo,  a  mappuiR  from  iS1  to 
/•;*,  after  an  appropriate  small  deformation  splits  into  folds  and 
i  usps.  As  we  tl  senss  ill  the  next  sertion  this  generic  property  in 
(lie  vision  oniloxt  correspond.,  to  genera!  viewpoint. 


KI&HT  wnoua 

Figure  6:  Inferring  tlij*  labelling  from  tmninAtioof 


7.2  Singularities  of  the  .  rojcction  mapping 

Project ion  is  a  mapping  from  a  snrf.u  onto  a  plane.  One  ran 
immediately  interpret  Examples  1  .uni  2  as  corres|>oudiiig  to  tlie 
ortlmgrapliie  projection  of  two  siirfaees  of  thr  form  y  -  f(z,z) 
viewisl  from  an  infinitely  distant  point  on  tlie  z-axis.  Figure  0 
shows  the  two  surface*.  For  y  =  z1  the  projection  of  tlie  fold 

curve  is  Mm*  lint*  1/  -  0. 

For  y  z7‘  zx  Mm*  two  f  »M  curves  have  the  o (ration 

x  — 

oin*  for  -  positive,  other  foi  *  negative. 

lly  <')<  miiating  z  we  got  :!k*  equations  of  Mm*  projected  curves 

y  =  *’  i’(3x*)  -=  -2xJ  =  -2(iv/x/3)5  --  t-^x1'1 

This  is  a  semi  cubic  parabola  with  a  cusp  at  the  origin.  Only  the 
positive  branch  is  visible.  Nnir  that  the  contour  ends  roncav* ly. 
For  an  extended  discussion  of  Miis  si*  |I0|.  This  f.wt  can  he  iim*d 
to  dcteriniite  to  which  side  I  fie  curve  belongs. 

What  does  all  this  imp’/  for  the  labelling  problem?  The  only 
c! uv<v  which  exi.il  in  tin*  projis  fion  of  a  siuoo;h  surface  patch 
a.c  (imhs.  and  the  ojdy  junctions  are  terininat ions.  In  tin*  line 
drawing  each  limh  projection  borders  two  regions,  or  soi.iel  imes 
two  >lrip>  of  the  same  region.  The  lunh  curve  lies  mi  the  surface 
p.ilth  corresponding  to  «*!•*-  of  these  strips  and  !-»  :n  front  of  tin* 
other  surface  patch.  Which  i>  the  nearer  patch  ca*i  he  determined 
hy  looking  at  the  curvature  of  the  projection  ‘l.e  litnii( in  the 
image  plane)  at  a  terminal  ion  junction  .is  diown  '•Vpire  (»  If  I  lie 
m  ene  consist  s  only  of  objects  hound  hy  -eigle  Mii«..*i/i  silt  |.ices(no 
<  * !"_«*«; ) .  ll.eti  tin-  oidy  junctions  in  the  line  draw  .eg  would  ho  T 
na  tions  a?  d  terminations. 


8  Projection  of  an  edge  neighborhood 

Aii  edge  *  is  I  hi*  intersection  of  two  surface  pat  chi's  St  and  .SN 
with  d liferent  tangent  planes.  Consider  a  point  V  in  the  interior 
of  llu>  edge.  To  study  how  P  and  its  neighbourhood  in  S\,  .S^ 
project  we  have  to  consider  Miree  cast's: 

•  No  limb  lliroiigu  l*  on  eilln*r  .S*j ,  .S’^. 

•  hinih  Itirougli  P  on  both  of  S’j,  *Sj. 

•  Limit  llirougfi  P  on  one  of  Si,  »sy 


Tlw*  lirat  cane  is  easy.  Since  time  in  no  limb  on  eitlwT  patch, 
tljp  projection  of  ImiMi  the  jxitchcg  are  difTfsimorphisms.  The  edge 
segment  in  the  ueiglilM.rhood  of  P  m  Mm*  Ixuindary  of  Ixitli  |wtfliM 
and  hence  the  boundary  cf  both  th«*ir  projection#.  TIm*  Libel#  on 
the  edge*  on  both  aide#  of  P  are  tlie  wuiM*. 

TIm*  nrroud  cane  i::  alxc  easy.  As  then*  is  a  limb  on  the  ray 
from  tlie  view<T  must  in*  on  the  tangent  pLvM*  to  S\  at  P.  Similarly 
it  must  lit*  0011  the  tangent  phuM*  to  .S’j  at  P.  It  tlicrdon*  limst  lie 
oil  their  intern'd  ion  which  is  a  straight  line.  In  other  words,  the 
v.uilage  point  is  roust  mint'd  to  lie  .along  a  line,  which  violate**  the* 
general  viewpoint  ;ixxumptinii. 

The  third  case  is  more  interesting.  Without  lows  of  generality 
we  fan  that  .Vj  luu«  a  limh  pissing  tliPHiyli  /*.  For  .Vj 

we  w . 1 1  asMime  a  general  isjiialMm  .uid  then  do  the  cam*  aiudysis. 
(VcUsuui  cooTflinates  an*  introiiiH'fd  with  tlw*  origin  al  P .  TIm* 
rye  is  at  iuliuiLe  flistance  along  tin*  :  axis,  no  tliat  tin*  proj«*clu>u 
is  on  the  x  y  )>lane.  Sr*e  Figure  7. 

For  surface  S\ 

fi  =  a^x*  +  «j5x  +  fli*1 

We  ;ire  using  Whitney’s  n*sult  on  the  normal  form  <*f  the  limb. 
As  h»r  tu  tli«*  direct  ion  of  projection  is  iixed  w  do  uot  liave  the 
frcetloni  to  get  rid  <»f  *!;*  niutiuu  L*rin. 


Now,  the  limh  is  given  hy 


•}y± 

f); 


—  Jiij*'  +  03V  -  0  z  = 


-njx 

~2n7 


213 


fly  xnLwtitiiluiy  Iwirk  we  *»■» 


TP  i 

i 


VI  -  (°4 


Tliix  ix  the  Mitw-Uiiiti  </  llx-  liml»  nti  S,  ill  tlx-  im-ikIiIxiHmk^J  •  i  tbr 
•  fiRiu. 

For  mirfan*  Sj 


--  b-  +  t|Z  t-  friz*  I  4jZ*  H  &4*1 


Ai  brtw,  .my  Uiw»  u-  Uiii*  waiU  Ik*  jpvcm  liy 

g=(,  +  2itz  +  M  =  0-,  = 


Ily  xulMtitntiiH'  Uk  wr  jjrt  liar  limb  cquatM 


/brail  that  we  have  xwniunl  tliat  tbrrr  b  no  limb  <*  tlib  |wtch 
luiminK  UuxmikIi  /*.  'll*  iin|>lhw  that  l»  ^  C. 

T)m*  rqu.it  i«::i  oi  •  iulrrscrtiotj  rurvr/^K**  n  givo.*  by 

-  f*0X  -  6| x  ♦  (ay  Ipj);*  ♦  («j  Sj)-*  +  (o4  -  fc4)x*  ~  0 


Figiirr  S:  Tin*  Unit  <pu*lraots 


from  which 


d:  _  fc|  -  ?(<t4  -  l>4)x  -  (a>  -  bj)t 

dx  -S>  +  2(rtj  -  fcj)x  +  (/ij  -6 j)x 


WIiaI  wr  rcAlIy  waiiI  Is  the  kS.^m*  of  the  Unjpiit  to  t]»r  pn»- 
jrrtiil  nlj*c  tf 

«>*  f  JF 
dx  t)z  dx  bx 

Siil>ntittitiiif(  wr  g«4. 


•h  ht  2(a4  -  l»4)r  -  («j  *3)*  ,  .  v 

-  i2*i;-?  t  n%x)  —  .  r  *  (rtjx  ♦  *»«*) 

dx  -S>  *  2(<ij  -  »j|:  t  |aj  -  6j)x 


Si  /  0  .  ^  0  .»!  iIm*  »%ngiii  winch  iimmjim  Mi.it  (Ik  |inijtrtHKi 

•  »f  !  1m*  uil«T*4'rtMNi  curve  in  l.uic<nl  to  t  1m*  |*roj**«  Ikmi  of  I  Ik*  limb 

«nrvr. 

\V<-  .in*  im»w  rr.M ly  to  li.*t  IIm*  join  lusts  win*  li  .vIm'  fnHit  Mm* 
1  m»ii  *4  .%  H  tiily  biiixill  im’i^MkmImmmI  .4  /'  N»4«-  M».U 

>1  hi  Mm*  im *i*M*I m  w  Im  m  **  I  nf  /*  11*  .1  •  >  limlrM'.d  jmlt  li  (Tins  in  from 
Whitney  ’*  Munirriii  I »y  «inl. title  <  hnn**r  <4  « Mm*  mirf.ii'i* 
.•mmi*  .1  f«4«l  nirw  1  an  I*.*  iL-m  filhil  l»y  2  /•)  la*t  T/*j  Im* 

Mm-  l.mrritil  |»I.um*  to  >J  at  /*  TIm*  vm*w|mniiI  in  •  ntj»lr;uiN*i|  lo 
Im*  in  Mm*  jm.iim*  lxi*t  7*/ %  In*  Mm*  l. indent  |>I.mm*  Im  .il  /*  .s'l 
aittl  JiVIvU-  I  Im*  litre**  illllMllfnoli.il  m  1  •  111  IIm*  IH'I^.IiImm ImmmI  of 
/*  mil*  four  ^M.wlr.uii-  .u*  nliowi*  hi  Fir.im*  H  lly  |»ulltng  m  «ol i«l 
1  hi  i.irmiiv  i|M.Mir»ilitn  ,m«l  virwini*  fr'*m  varNHin  «lirin-lKMis 

ill  77*1  w<*  r.ui  -f|M*ral«*  .til  |n*o>iIhIiIm>  TIm*  rr.wl»*f  will  iM»tr 
Mir  -umlaritv  of  llih  jhim  tthm*  to  Mm*  llitirniAii  ( *lowt*n  |iniri*iliirr 
»U*?mtiImi|  111  '7|  To  Im-^ih  JIm*  rvuM-  .ui.dynw: 


I  Sot i< I  ni.iliTi.J  111  «Kily  < mm'  ijii.iili .ait  Tlim  riiv«*n  two  •iiImmmu 


(.1)  S*lnl  10.1lt-ri.1l  111  t|ii.uiraiil  I  I V*I»«*||<||||M  mi  wIm  IIht 
Mm*  Vli  W[Milllt  i.«  III  Mil-  M|»|M*r  Of  ttiWtT  lui'f  | li.UM*  wr 
^i*i  Mu-  two  1*1111  111  Ki^un*  0  !ti*i  all  1  li.it  w«*  li.nl 

!*Iiowii  i*. 1 1  Imt  tlial  t  lit-  jtrojirl  ion  of  I  Im’  m|^i*  m  t.ui^rtii 
to  l|»«*  j»rt*jii  lion  of  tin*  liinli. 


Fipin*  ft  Virwua<  iptAilrui  I. 

(li)  Soliil  iib^tThiJ  in  i|«iiMlrjuit  2.  Tliui  uiw*  ns  tlir  'jnitr- 
t ion '  in  Kigun*  It!  Ni.li*  M1.1t  im  tl***  I1111I1  rum*  itnrtf 
is  <h-i  IimIimI.  Unl.ki'  tin*  otln-r  juix  tOMin.  linn  r.niiMii  Im> 
itKiililit**)  IkTaiMr  11  «Um*m  ihM  to  any  I Ai»- 

S«iit /rnrvatnrr  dim  juitiinnif  in  Mm*  Iiim*  ilrawing.  W«* 
lixivr  to  allow  for  Mm*  |MMyilnlity  of  Mnn  joiK-tioll  Im  ing 
jift-Miil  l»y  ml r«*«|iM‘tii^  '|*l».itili»ni  immIi**  ihi  .ill  «’iirTi*il 
Iiim*^  in  tin*  tlr.twin^  win*  I.  •  on |«|  rorn*n|MHMi  l*»  alfpn, 
|  Tli  ib  jiiiMlMNi  r.uiN>t  im'*  nr  on  in  his  of  IniiiM* 

IM  ftMII.IVi  t*t|^n>*) 

I 

I 


I 

I 


Figure  10:  AhoMut  'junctmo* 


* 

# 


-*• 


214 


S  ilhttihitiii*  wo  jot. 


Figure  II:  One  more  labelling  f<-*r  a  Cnnr*iure-L 

2.  Snlid  material  in  two  <|u.ulr.ml<  Tin -re  .in-  lw*»  miIm-am*. 
Oih‘  i*  win'll  inlj.iniil  ipnwlrant  r*  .in*  «*»'riipM*d.  In  tlwil  *  aut 
I|ht<*  i><  iNily  «hm*  xitrfa**-  at  f*  Tlr  nllnf  rw  i*  wlnn 
np|Mi*ile  i|ii;wlr.'Uili«  fire  iwenpiol  .  f«>r  ex.unjde  ( IJJ)  <»r  (2,1). 

In  1 1 ii»  i  .im'.  <  viohtfn*  iHir  ihdimliuii  if  an  edfgr. 

3.  S*did  i.*alertal  in  three  «|«i*draiilj».  In  Hie  Hmt  kuIk-.im*.  let  I 
Im*  iIm*  empty  i|HAilr;uit.  In  tins  i.w  /*  i«*  hidden,  If  2  it  thf 
empty  ipMiJrant.  we  gi-t  llir  jniwt.ou  iu  Figure  11. 

4.  Solid  material  in  all  four  <|tiadrant*.  Hitt  no  *urfarra  m 
■le|t  nrd. 

9  Projection  of  a  vertex  neighborhood 

Wr  !>i-s:i  iMir  a/ialys!.*  of  •!•(•  .■roj.Tt;..n  <4  a  »>*ri<'»  witlt  !)*■  .4v 
MrvalKHi  llial  Hinli-r  *<iM-raJ  »xnr|aMiil  ho  lin>l>  rail  |«m*  through 
k  vi.li  x  A*  lla’  v.uitagr  |a..nt  itaiyra  I  Ik*  liml*  rurvr  tuimw  on  tlie 

Kiirfai  i-  an* I  il  uiliTK.rl,  ilw  v.Tlri  only  for  a  |  tart  a  n  Ur  yi«  w|HHnt. 
Our  n.-xl  uiisuTvaliiMi  t*  IN-  following  p-anlt 

Tlirorrm.  If  I  urn  Kiirf.Mi*  rliiurnli  si  ami  intiTw-rl  along  a 
,„rv<  r  -il  till-  |h imi  /'.  lin  n  77’,  Hm-  i.iiiriiiI  pUiv  In  .*>,  at  I’ 
.,11.1  77’,  till-  i.uir.  iiI  plain*  II.  Sj  at  I’  iiiIitx-tI  aliHiR  a  .iraislit 
Im.-  I  .1,1  li  lli.ll  al  r  IIm-  |.n>|i,  IhMi  of  f  •  u  Hm-  imar.i-  plaiM-  u. 
al  I..  ’In-  prujiiium  of  t ' 

Proof.  I'll.-  i  M.r.linal.-  .) .i.-iii  i«  Il  •  Kalin*  •«  «t  •ti>-  |.n  *»«i» 
r,.i,.i.|,  r  III.  Taylor  S.  ri.-.  K*|v  •ikuhi.  •>(  Hm-  gr.tplt*  >4 
li..-  i,n  -urf.M  ■*  will.  /’  a.  III.  .o  i-in  Tl»-  I.  "I  am  I  1‘ir.U  f  iw.Iit 
I. -riii.  .  an  l.i-  i*'Im .f .*•  I  mil  I.. Mil  I.k.k  ii(  g.,i*-r.Uil). 

y,  „„r  I  „|  r  .  . lyJ1  I  .12-/  I  rt,/* 

if2  tv-  *  /t|  r  ♦  *  /ijcx  t  t*«r“ 

Ktr-I  lei  II*  Imd  I  lie  dpi.it  i»»«i  *»f  I  lie  edge  *  «rw  ahm-  whwh  tlirw* 

-airf.w  (**  intersil  Thi*  »*  given  l»y 

|.i„  f„  I  -  i  (<i|  hi)/  1  (uj  h)'7  1  ('’i  1  (“<  I’ll-*1  0 

fr«»in  wlmh 

4z  (*»i  t*j)  ■  2(fi4  In)*  t  («*i 

»//  (m,  M  •  -(•«?  ,h: ) -  t  ('*i  h*)* 

Whit  we  really  Want  t-  I  Im*  do|H»  t»f  ill*'  tair'enl  u<  I  h«-  |>nh 
jivlol  ewrve  \e 

.hi  fly  tlz  dy 

di  i ):  tl *  *  i)t 


d1  .  .,  ,  («i  -  ,:i)  f  -ft.  -  l>,)r  *  («j  - 

it*  («<i  •  M  I-  2(m5  -  Ih);  l  (uj  -  frj)/ 

At  tin-  imgiii,  llii.  xiirplihoK  to  1  *  “32  1  -'*<*) 

dy  a,^|  o,Ao 

dx  an  -  ii, 

Vow  roUKi.ItT  Um-  npiatxiuK  <4  I  Im-  two  l.uip-ut  /.Luk-k 

Jfi  =  oo-r  +  o,x 

»  Ao  6|* 

The  <-i|iia1mhi  of  llio  iul«TKcrlu-„  liur  in  gimxi  by 

(no  -  4to)r  +  (.1,  -  t.)i  -  0 

I'koir  I  Ilia  lo  i-fiiuiuair  z  wr  grt 

a.ff'1  ri  I  Ayr 
t  -  -  .  * 

<Ui  "a 

wlii.  li  ik  IIm-  kaii h*  -I-^m-  ak  n-ipiiml. 

TI.ik  n-K.ill  Ii*ik  an  ,iit,iM-ili.4lr  t-iMiKi,|iMUM'i*.  'Hm'  proj.*r t k mi  i 4 
A  y.Tl.-i  iiM-mllz  looka  Mu-  iIm-  oroji,  lam  of  an  i-iiiiivaJ.1,1  (ajylir. 
ilr.J  y.-flrx  r.wllN-il  I n-pl.a  mg  .  a  ll  .4  l(a-  KUrf.w-i*  i*lin<!-,.|»  l^r 
lla-tr  I.iiiri-iiI  plaimK  'Hiik  nxiiill'  in  a  :;r*-al  "Mi.pl.lii'alaNi  in  lla- 
.oiilyxiK.  ,ta  ;JI  llu-  r«-*>illa  .m  la.lyla-ilral  jiiiH  lMHi  LJa-jling  ilia* 
rnaaril  is.  S..  lion  ">  Ihkihim-  nlc-uil  hi  ml  lla-  (■.|inv.vl.nl  atraigiit 
liia-  jiiih  Ii.ki  by  n  pl.a  lag  mu  li  l iiliRi-  rurvr  al  i|m*  juhtImmi  liy  :La 
tajiginl  ;unl  ka.k  up  ilrmi*)  na  !aIm-1|iiik  |nn.iUliliiy  farm  a 
(MilylH-ilrAi  jimrlHHi  r.u  Uog  For  rxaiiipir  if  i:  ia  Iiihiwu  a  priori 
tluit  rx.irlly  llin-r  anrfarr  f-k-iiH-ii*.  tm,*f  at  a  v.-rti-x.  IN  u  tin*  la* 
Im-Hiiir  | mmkiIhIiI m*k  -in-  i-x.a  lly  llara-  ,4  lla  lli-lfin.ui  Clowra  wt. 
Ak  wi-  w.inl  lo  ilr.-J  Willi  a  oair..  ...N-fiil  rl.ira,  furllaT  .uialyw.  will 
U*  iMKOKUtry  aa  in  lla-  la  xl  Milaa-rtiou. 


0.1  Lobolling  p.  ;  ylii-ifi-al  Junrt’ons 

In  Seel n m  2.  wi  revN-wt  d  perli.i|H*  llie  twit  niotl  si^inf(«  .i|it  piere* 
id  w«>rk  ihi  lln*  prttl.h-iu  d.iftui^  with  llnlfm.tii  C’htwe^  LtiH’lling 
for  i*m*  Inhetlr.d  wncM  umI  liwnrtli  *•  4  !r*i«lN*ut  Sjwm  e  .qtjtnwir li 
fi»r  de.kling  wit  it  .u’htlr.iry  utiinlMf  <tf  •trfii  tu  iiM-i-4iiig.il  a  vertex 
We  .J<HI  lie.  e»i  Dm|mt  j»  emptrtf.d  nlw-f  val  lot  in  imi  I|h*  i  irjfihnuUo- 
n.J  exphwmn  in  Hie  iniitilter  of  ,\lt<rnat  j%e  interpret  at  i<  tint  wlnti 
Im >  n >tri«  tiiHiM  .ire  iiwuh-  uti  tin-  niiinln  r  <tf  /.nrf.ieeM  iiMt  ling  .U  a 
vertex. 

It  I*  i  leaf  tli.tl  wi-  twill  wfMite  w.iy  <»f  priitiing  tli.t»e  weird  .n- 
terprelaf  n»t|j»  K.tiiaule  itl  t*,  ex  |i|«ritiii  Mriili  ^re.il  -tierin  l|te  fitrt 
th.it  p.  lei  ititi in  the  -4  ■ -ne  prnjn  *  i*t  p.ir.iih  l  lint-',  in  the  ittt* 
•  i-e  .iiel  -y  intiH-f  r iei  pfejii  I  tit  -kewe«|  -y  linnet  i  le*  The-e  heiiri.i- 
tji«  niifert  mi. il«  ly  are  ik  tui  nnljr  f»»r  t  lie  i  of  nhje«  .*  win*  h 

h.ive  parallel  til-;*-*  alwl  f.on  with  a*  nf  -vnnitet '’V  W«*  ihiiI  a 
«ntir:i.n  w  hw  It  i ■»  tiwtre  •*< m  rally  ap;»li<  aide  S*-v*  r.d  alf<-tn|»tii 
I  iiaVe  l»*«  :i  lii^le  In  tl  I  »•  I  -I  if  i|  •!  i«  <>V  •  t  it*  ria ;  1 1  tl  .I'lMir.at  |«»tl 

«  liiliie,  ft*  (ti'd  the  p*)  I  I|.  •!«  *-•  |«  ally  |.n|err««|  iutefpret.it  ifHiM  nf 
III  if*  tlr.lWil'--  Mitwl  Itl  I  Im  -I-  .Ippri*  m  In  *  ale  luiltfe*!  In  I  Kill,  it  ill 
i  M 1. 1  ;**  i  mu  •  hi-li.ul  itl  alt  eit(|»|  tie*  1 1  n«f  -*,1.111(1  "Ji.h.il  *■  inplie* 
.tv  <  ri  I*  ria  we  luiiiti  <|  i.nr**  Ive*  •<»  Iim  al  •intplii  it  >  lla  t  1  *»i»  l|ie 


215 


«»l»*Tv.it m hi  lli.ll  all  the  highly  cfuinlerim ail  iv»*  iuterprH.il  ion  in¬ 
volve  a  iiuiiiImt  »*f  hidden  f.w***  wo  ik-.rl'f  <nir  crilefHrti  for  e*ch 
junction  Jin>l  the  nrrlrx  intr rjirt  laiutns  rtlh  ihr  minimum  nutn- 
fcrr  of  Jure  a  turrlin*t  at  the  vertex.  Tin1  inl**:prrL  tins’  should  Ik* 
.table  iiiuliT  “iii* *nil  VM*wjH:inl.  N«*w  (•**  |M(lylHilrn  at  .1  wtlrl 
then*  an*  exactly  twu  fares  sharing  aj.  edge.  and  exactly  *  w«»  edg*** 
ImmiihIuij*  a  f.ire.  it  follow*  that  the  iihiiJmt  *4  *iln«i«  «*d  *  vrrtrx 
in  fim.il  lo  iiiiiuIkt  *4  fuii*  iiK-iiUiil  at  ih*  v**rtex.  Thcnv 
fore  we  have  an  iipuv;d»*V'  v*t*ioii  <‘f  tin*  ru!<*  find  iuliTpret.iluwj 
invtilviiig  minimum  ai’udxr  of  edge*. 

An  example  will  make  this  clearer.  (smsider  ail  arrow  junc¬ 
tion.  Knrh  nt  I lw  Him*  inn-.,  wlurli  meet  at  tin*  jinn  ti«m  in  the 
project nhi  of  all  *ilgr  which  in  inririt*ii<  *h*  tin*  n»m*Np«Nidi«ig  vrr- 
ti*x  From  mir  simplicity  nil.*,  wr  try  In  liml  vertex  intiTjirriiv 
ium*  which  mpun*  only  3  flgri*  or  mpuvah  fitly  only  tlm*r  face* 
ineHiug  at  I  1m*  vitiex.  This  iitr/uin  tlial  tr  tin*  amnw,  ofir  Uiral  U* 
ln-Jung  jm >ilit itw  mt  tin*  sauir  an  tiiat  for  tin*  ’Intftunit 

wIm'iin*  for  tlw*  triln'dra)  world.  By  Mm*  muut  n  wmmt;,  for  Ihr 
Y  junction  again  we  gel  the  same  Mirer  UlM*lling  |MNnnhilitir«  ai 
in  tin*  HulFm.ni  (’kiwi**  frlieme.  For  L-juiirttou*  we  nerd  thrrr 
farm  an  well  Own*  arr  no  k*g.U  interpretation*  with  two  farm. 
I lerc  a^ain.  the  |.d;elhtig  jmnhmIu  lilies  arc  the  luir  a*  that  of  the 
lliiifinati  (“lowes  scheme. 

For  higluT-orihT  junction*  w*th  4  and  nmrr  line  meeting  al  tlir 
jtmrf  khi  we  nerd  n  hum  meeting  al  tin*  viTtex  if  Uicrt*  are  n  lu»r* 
meetup  at  iIn*  juirtiim  W**  iMi*d  to  develop  an  .ilgociMim  for  gin- 
*T.xiing  Mm*  LiIk  IIiu^  jmmmIuIiIm**  for  such  higher  order  junrliouf 
it*  may  occur  ill  a  line  drawing  e§  in  (tvrrhr.ial  views  of  square 
p)raiiii*ls.  We  will  lirsl  wrvd  an  auxiliary  mutilt  jKtwnl  in  |13| 
Theorem.  If  tl»err  are  n  Iiih*»  uniting  at  a  jfinrlioa.fi  >  3, 
all  the  Ld m'IIiiik  |H*j*iliilitim  for  tin*  junct  ton  which  nvrrtjHXHl  to 
iiiiiiiiMiim  ihmmIkt  of  fnr<*i>(n)  at  1 1^  vittri,  cucTiitpoud  to  cither 
n  .*»  or  i*im*  hidfleii  lac  at  'In*  junction. 

Now  we  arc  ready  to  generate  •  I m*  Un-lling  |mmni1h1i t m**.  To 

*  I*  •  tin*.  1ir»l  consider  die  simpl'T  case  alj  llie  I.Jn*U  arr  convex 
or  loiif  .ive  Til***  f «*rn*"|NHnU  to  llM**.t*e  wInii  llnve  is  imi  hidden 
f.iM*  If  1  i.ilM'llinr  1*  l*  ;;.il  it  »Ii«hiM  Ih*  |hm:>iM<'  l*»  oHifirmt  a 
r«i  ipr<H.d  lisiir**  im  !*r.Mln*nl  *p.n  e. 

Imii'.ul  <4  irc.  lm-  linn  it*  a  “miiM  irir  c*Hn*t rncti.ni  with  hiWt 
ami  |him  d.  we  <  all  «!•••  v«i  !<«r  nol  .1*  Ntfl.  I4J  l||ll].  ,u«  Ih*  I  Im* 

(•Mil  w.ifil)  |  n  h  til  inn  v»i  tor*  *  *»m  ■*|nnmIiiir  l«»  I  In*  Iiin**  wIihIi  iim*H 
.it  1  In*  non  Inrti  h't  V|  v;,  ,v„  Ih'  tool  vn-tor*  jntjh*imIkoI.ic 

in  t  Im  -h*  Iiim**  1  orri‘«|NMi.|uiH  itt  a  * iNtiilfT‘4  Iin  k wim*  rol.U nmi  l»y  00 

•  h*“rni*  *  «Hu»i#h*r  tin*  VM'tor  fiptalMHI 

I |V,  *  / JVJ  4  •  ■  f  /.¥.  =  0 

§> 

wIht**  .  ./„  an*  n  it.dar*  .  We  wd  n*f*T(tj.  tin*  fipiation  a* 

1  Im  Fuinlaiiwntal  .laiHthrti  K*p(.iiii»«i.  If  tin*  ni-»pn*al  ligurr  i« 
roii-liiM  I  tide,  it  4  «>rr«  *>|M»iMh  to  a  roll  it  11  Ml  of  Mm*  npialHMl.  N«»tr 
Mi. it  mir  n  /<«m  ft>r  m  /mr.i  .iMHiiiupl mhi  11  niiph«iiiy  lmr.«**l  in  t lim 
If  iIht*1  win*  linM'ii  Iiin-i  « orri'*|NMi«lin“  lo  I  wo  linliten 
U«*»  UMi-iin-.  they  would  have  “tven  ri***  to  -nIiIiI imi.d  Lerinu  oil 
l  he  l«  (l  liaml  !*i«h*  M  mm*  npialUHi. 

Thu  iipi.it ion  I**  f  1* st  a  ?  I)  vii  tor  npiation  1*  .u  lii.diy  two 
linear  « *pial nmi*  1*1  th«*  r»  nnknowM*  /|  ,/j.. .  Jn.  A  1  oiivi  x(roiicare) 

I.  IhIIiii^  for  a  hue  implied  that  the  1  orn*«poii<liii“  v.uiahle  /,  > 
‘h  •  ‘M  11m*  '•  I  of  lain  Uni**  •  f4ir  ih**  !m>  1  mi  t  hii**  at  tin*  juim  t mil 
1  >4i ri  -pond *  I •  1  a  **  •  o|  « 4MI-I  rauit *  on  Mm  h-  iinkuMWii*  If  tin* 


Fijpm*  12:  (leiuTatins  junrtimw  with  nrrlink*  Uibcis 


pyidrnt  of  linear  < ^nation*  and  iiidrpi.diti***  ha*  a  feasible  solution, 
tin*  Lii.-elhng  i*  lej;.il.  de  it  is  cot.  The  naive  .q*pio#v  li  for  doing 
I  lit*  woo  hi  l>e  in  Mol  ve  2"  Linear  rrograiiiiniiig  proldenw  in  onlor 
to  v«*nfy  all  |M*Mihk*  leg.d  lalK*llifigu.  Oim*  can  do  nnieli  better 
titan  tiiat  a  Lvt  algordlun  is  *U*wrdKil  in  |15|. 

By  (  bulging  l lie  sigiw  of  all  tin*  variable*  in  the  liiiem  system 

•  Ii*m tiIhiI  .iIhiw*  wc  mv  tiiat  hdM'lUiig*  route  in  |>ain.  This  if 
uhMIht  way  «»f  exphuning  tin*  Nivker  mnliignity  rorreii|M>iMliiic  Lo 
tin*  «o*iv«*x/<  imn  avc  tvviml. 

To  liml  Mm*  legal  la)H*lliii|,s  cornv|Hni<«mg  toCDne  hidden  fare  if 
e.iuy.  Take  a  legal  ;dl  (c*h*v«  x/r<mravr)  nig**  Liladling.  Karh  |Miir 
«»f  adj.M'eiit  lino*  at  the  jnneliiKi  *li*hne  a  seetor.  (*onsi<J«T  a  wvlor 
ile|iin**l  liy  two  linn*  A  .umI  Jl  which  have  Imiti  1.iIn*II*'iI  concave, 
t  '1  niwi.h  r  the  f;u  •*  i|i-fiui*d  hy  tin*  n «*rrii|NMMliii«*  iilgei*.  If  this  f.uC 
were  It  i«  lilt’ii  it  Im«i1i  |In*h4*  limv  corn**  pom  led  lo  convex  t»rr  hiding 
filges  instead  of  concave  e»lg<*s.  the  r*i  ipr*N  al  liguri*  wmilil  remain 
tin*  sans*  Oim*  c.ui  Mi*Tef«*re  UIm  I  A  ami  /I  .1*  (Mchiduig  convex 
ed~i**  wi*h  1  he  tlir«i  1 1.  mi  of  1 1  m  *  arrow  >.iu  li  that  t  lie  wet  nr  ihliiml 
l*y  A  ami  //  *>  t« »  ih«*  l« *ft  Kigi.n*  ijL^hows  this  procedure. 

This  hieranlm.il  ilel«*rmiual  hmi  of  lahi'lluej  pt*si*iluld  ies.  lirst 
lH'twn*ii  convex  ami  everything  i*|sr  in  Mi**  solution  of  tin*  Kumla- 
HM-iit.il  Junction  aiil  Mien  niilts'*|iiiiil  ri*l*nem«*nl  is  also  a  m«hh| 
slrat****y  f**r  »li*»*i“  *  *Misi*li't»cy  1  Im  king.  F«*f  ex.tniple  1  oiishUt 
arrow  Y  paths.  At  tin*  Kuiiilann  iilal  Jiiiiclioii  Kipialieii  li'vd  I  he 
arrow  li.is  only  2  lain  llllig  poss|hihlM*s  relah-d  by  a  N»i  ki*r  Hip, 
th«*  same  is  true  f*nr  a  Y  For  .ui  arrow  Y  path  one  ran  tin l h out 
ucHfrii  liml  I  li«*  i-x.w  t ly  2  possible  c*»iistsl«,nl  talwlbn^s.  S«i*  Figure’ 
l}  for  a  iiM  c  example. 

Willi  Inml  s|.»hl,  ,1  appears  that  M.uk worth's  (hvcmposil ion 
hrsl  1  Immr.ui*'  Im  iwuu  i  i»iin*i  t /iioie*  4ii.n(i  l  anil  tin’ll  Tor  rouim  t 

*  *  I  ** » **  lntw»‘eii  1  mivi'x/i  him  ave  w;is  not  illiii.nl  fioni  a  s**arrli 
pi  mil  of  v  iew  Stiuiiaily  VYall/.s  appio.Mli  ol  ixpamliug  the  he 
I »i *1  *4-1  l»y  (  tMeiileruig  «li.ulo*v  nlgiv  .  roii:*h  orient  at  ions  etc  while 
helpful  m  coiistr.inung  Mu*  li.tal  nuuitur  of  mlerpr*  tafions  found, 
is  a  hail  i«l*  a  from  I  lie  s-.iri  li  point  of  view. 


* 

ft 


Ur 


216 


Figure  13:  (Juiug  A  cot  laps'll  LJjH  set  to  mSnrr  **Affh 


10  The  labelling  algorithm 

Ignoring  f*-r  (ho  moment  tin*  invisible  ‘jinn  lion’  in  Figure  10,  con- 
o*i.  tally  tin*  labelling  algorithm  is  straightforward.  The  local 
possibilities  at  each  junction  with  <  3  limn  have  horn  etminer- 
ated  in  Section*  C  9  ami  for  multi  junrtioit*,  the  labelling*  ran  be 
computed  by  the  procedure  described  in  Section  9.1.  Consistency 
in  farced  by  requiring  the  label  at  each  end  of  the  line  to  be  the 
Maine. 

In  tile  previous  section,  wo  a aw  how  by  collating  the  lain  I 
set,  a  great  speedup  could  be  obtained.  That  is  just  a  *|nxjaJ 
case  of  the  notion  of  re  lazed  constraint  satisfaction  problem*  ( 1 5] 
which  can  lead  to  grr?  ♦  *|>eodiip  (in  the  heuristic  sense)  f»»r  certain 
kind**  ol  problems  line  labelling  being  one  of  them.  Tln**e  idea* 
are  used  to  develop  a  f;u*l  algorithm: 

1.  Split  ib  *  ISC!  at  T-junction*  and  bud  connected  compo¬ 
nent*.  ICacIi  coinpoiieiil  can  then  he  labelled  iitdepeinb'iitly. 

2.  l-.brl  Hie  drawing  wilh  the  label  sc*  (  limb.  «-dge  }.  For 
each  of  these  labelling*  perform  step*  !I  0  below. 

.3  Introduce  phantom  nodi**  on  arc*  corrc»pondiiig  lo  corvid 
tn ‘go*. 

1  Libel  I  lie*  edge*  with  Hit'  label  *4*1  {  convex,  linn-convex  }. 
For  each  of  these  labelling*  perfoim  st'|>j!>.~6» 

5.  For  tael i  jura  lion  lim!  labelling  pi>*diil:tic*  consistent  with 
the  previously  assigned  coarse  labels. 

G.  Perforin  a  node  and  arc  consisti  ncy  liltering  followi'd  by  a 
h.uklrack  search  lo  generate  all  labellings. 

This  algorithm  was  implemented  and  toted  on  several  band- 
inpid  line  drawings.  Il  bad  earlier  In-en  feared  Ihd  there  would 
lie  a  large  number  of  interpretation's  due  to  the  phantom  node*. 
Tin*  labeii.ags  we  obi uiued  ubslant  ially  correspomh  d  lo  iut  nil ive 
ml erprelal ion*.  Tin*  mosl  eoinmou  ambiguity  wa<  between  con- 
r.ce  and  m  i  biding  convex  edges,  corresponding  lo  Hu*  inability  So 


I‘igU"e  M:  Possible  babelbngs  of  a  ( ’nrved  Object 

decide  whether  an  objirt  was  stink. to  a  labh*  or  wall,  or  floating 
m  air.  Ibdike  what  Ji.i*  been  eoiiiuioii  in  line  diawiug  labelling 
work,  we  did  not  a.**i:uie  that  I  here  is  a  l»a»  kgronml  siicii  dial  all 
fi,,e**  borih  img  i!  are  «h  <  hiding.  'Plus  is  merely  a  tiennsl ie- it  fails 
h*r  holes  lo  Jake  some  exainiileH,  l|ie  line  drawing  of  a  cylinder 
lias  two  I  ihellmgs  eoi  responding  respectively  lo  a  cylinder  float- 
mg  in  air  or  a  i  yliuder  rest  ing  on  a  l.ible.  We  will  close  l.y  g\..,g 
Hie  possible  labellings  of  a  curved  nhpx I. 


217 


•.  ■  v 


*.  %  %’ 


v* • 


11  Evaluation 


Wo  urge  tin*  reader  to  judge  our  p«*rf<»nnaiire  with  ro*p»*rt  to  the 
criteria  in  Sort  ion  3.  In  uur  opinion  wo  have  met  criteria  1-3  and 
5.  On  our  performance  is  good.  »mu  wo  do  not  yet  have  ;*  rig¬ 
orous  roali/.ihihly  lost  liko  I  Sal  of  Sogilnm  f«»r  polyhedra.  Thin 
is  the  subject  of  ongoing  work  On  criterion  b,  nor  pirfornumcc 
is  poor.  bieause  tin*  scheme  relie.i  on  bring  able  In  segment  at 
tangent /curvature  discontinuities.  However  this  scheme  could  !»*? 
combined  with  tho  use  of  local  imag<  intonsity  info,  nut  ion  iu  a 

verification  :  *»p,  making  it  more  robunt. 

Acknowledgements 

I  wmk  to  iut  TkofliM  O.  BmM  *4  Cbristo* 

H.  Ptfadimitnow  for  tnmxy  rritkamo  i*4  dmominmt  Tb» 

work  «>•  ivpportcd  by  AHPA  ruttnrt  N0003W4 04)211,  NASA 
contract  MEA  SC-1M28  tod  by  mm  IBM  frtovakip. 


References 

[I!  lhtrrow.  II. 0  and  J.M.  Tcncuhaun;.  "Interpreting  line  draw¬ 
ing*  am  throo-.liiiioriahuiaj  «urfarc*,*  Arti/idol  Intel  tgenee, 

17  (1981),  75-IIC. 

J2j  lliuford.  Tlmmas  <>.,  “Inferring  Hurfucm  from  image*,*  Artifi - 
ctcl  Intelligence.  17  (1081),  205-244. 

;3|  IJrady.  Muh.ul  and  aIaii  Ymllc,  “An  Extn  mum  Prinrtjdi? 
for  Shape  from  ( \n\U"*t Proceeding*  *j/  /JfM/-tf|Karl*mlir: 
August  1983).  9G9-972 

'll  ("hakr.iv.irt  y.I.,~A  gmeralued  line  and  junction  labeling  **  heme 
wiili  application*  to  mriir  «u».Uy*i*."  HIKE  Tran a.  Culirrn 
Arm/.  Machine.  lntclligrncri{2){ J979)  202-205 

|j|  ('Inwi*!*.  V  H  "On  neeing  filing*,*  Artificial  / nlrUigmr r,  7 
(I9?l).  79  1 10. 

T#i  Dr.ifH  r.  S‘f  <  phi  it  W  .  ‘lb  -axoiung  about  oepfh  tu  Jiite-d  rawing 
nit< rprotat urn,"  PhD  llie*is,  S»i**ex  I h..vctMty  1080. 

^ 7 (  *,i)lfman,  I)  \  .  “fiiipo-Mlde  ohjicf*  ,\r  non-Miiae  set  item  r*.“ 
Ahirhiftr  mtcllnj*  mr.  6  (1971).  295-323. 

'M;  han.iilo.  T  *!t«Tovery  of  I  bo  tlirrc-diiiicuMoiial  shape  of  an 
ob|rrl  kroin  a  mio-.Io  vuw."  Arhjir ‘til  intcthijrnrr.  17  (1981). 
199-  iro. 

J9l  L  .vim)  C  l|  |vap.iiliiiiilr.Mi,  “  Complexity  of  Her* 

ogni/nig  Polyhedral  Seem**,"  Stanford  Ti*  li  Kept  1984 


;13j  Niackworth,  A  K  .  “Coiwixtcury  in  Network*  of  Relations,* 
Artijictal  Intcilitjzncc.  8  (1977),  99-118. 

I4|  M^ukwortli.  A.K.  «uid  DC  Fr  v  hr.  “Tho  Complexity  of  Some 
Polynomial  NVtwoik  Consistency  Algorithms  for  Constraint 
ft^f.wtion  ProMniwr  Artificial  Intelligence,  25  (1985), 
L5-74. 

!|5|  M.u.k.  .litrmira.  "lntorproting  Line  Drawings  t>f  ('urved  Ob- 
j»xts",  PhD.  Tlioai*,  StvuiforJ  Uuivornity  ^985. 

|U>|  MarT,  Davit  I.  W*ion  (San  Frai  Jseo:  W.H.  Proem  an  \mlCo. 
1982). 

'17]  Sli  4nra.Untli  mid  IJi'rlnTt  Pre«*mAii.aCouj|mter  D<-#eription 
of  DomrUtl  by  ijuodnr  Surfare*  from  a  Set  of  Im- 

p<Tf-Tt  Projiviion."  IEEE  7 Van*,  on  Computer*,  September 
1978 

( 18]  Sugihara.  k  .  "Qiiantifativo  analysis  of  line  drawing*  of  poly- 
lu-dr.U  sroiien.*  True.  F;*u:th  hit  Joint  Conference  on  Tat- 
'em  Hccognttum.  (Kyoto,  1978),  77J-773. 

Il9|  'l»ii iht.K  J.."(*om|Miti*r  jM-rerptioii  of  mrvitl  objit  ta  using 
a  iHevi.'kH!  raiiu-ra.*  PhD  Di>m*rtalU»H,  l^linhirgh  UniviT- 
sily.  Vhlinhiirgli.  So»tl.w  l,  1974. 

|29j  Walt  I)..  “Ihido'si. Miding  I'iih*  drawing*  »»f  •oi’h-tc  with  xltad- 
«w x,*  Thr  I'/tyr  UoltM/ y  ••/  Computer  Vixutn,  Ed.  P.H.  Win* 
xtoti  (Mi  tiraw-K  H,  1975). 

[2l|  Whitney.  H.,  “Singulantn**  of  mapping*  of  CiKliih-mi  Sjwwes, 

I:  Mappiiig*  of  tin*  pLuio  Milo  tlie  plaike,”  Ann .  Math  02 
(1955)  374-410 

|22|  Wit  km.  Andri-w  I*..  *lnto»0ily-ba*eil  edge  rhu»*iliratio(if” 
Croc  ruling*  of  AAAI-N2.  *Mlnbnrgh,  August  1982,  30-41. 


!  ill]  Kociidt-rink.  .1  .1  .uni  A.I.  van  D»H»rti."  Tin*  shape  "f  siminiIIi 
ohjoots  and  I  ho  way  rontour*  end."  Tmcptinn  II  (1982), 
129-137. 


GCPW( 


Solving  the  Depth  Interpolation  Problem 
with  the  Adoptive  Cbebyehev  Acceleration  Method 
on  o  Parallel  Computer 

Dm«  1.  CM  ud  Job  K.  Kndw 
D^vtMt  at  Caaaptnar  Sciaaca 
CMaabia  VmiaanHj 
Sam  Yarfc.  N.  Y.  1MXT 


Abstract 

This  paper  discusset  solving  the  depth 
interpolation  problem  using  the  adaptive  CLebysne.' 
acceleration  method  on  a  parallel  computer  to 
speed  convergence  Many  low  level  computer  vis.on 
problems,  imduding  depth  interpolation,  can  be  cast 
as  solving  a  symmetric  positive  definite  ySPD) 
matrix  U.aalfy,  the  resulting  SPD  matnx  is 

sparse  We  first  show  the  derivation  of  the 
adaptive  Ghebyshey  acceleration  method  when 
applied  to  tny  SPD  matrix  We  show  further  how 
the  adaptive  Chebvshev  acceleration  method  for 
sparse  SPD  matrices  can  be  run  on  a  particular 
parallel  architecture  (fine  grained  mesh-  and  tree- 
connected,  SIMD),  where  the  Jacobi  method  is 
chosen  as  the  underlying  basic  iterative  method 
Lastly,  we  show  some  preliminary  simulation  results 
for  sjnthetic  images,  and  compare  them  with  the 
results  from  one  of  the  commonly  used  methods, 
the  Gauss-Seidel  method  We  also  detail  our  future 
plar.s  ‘ 


1.  Depth  Interpolation 

Human  perception  is  a  vivid  one  of  dense  and 
coherent  surfaces  in  depth  This  suggests  that 
there  exists  a  visible-surlace  reconstruction  process 
that  transforms  sparse  information  into  a  dense 

suifuce  representation  The  low  level  visual 
processes  provide  several  visual  cu»s  to  reconstruct 
Ihe  visible  surfaces  In  particular,  one  low  level 
visual  process,  stereo,  generates  depth  only  at 

scattered  points,  another  process,  such  as 
photometric  stereo,  generates  orientation  at  points 
that  may  bs  scattered  as  well  With  three  sparse 
constraints  a  depth  interpolation  process  would 
compute  the  diptn  of  the  visible  surfaces  at  every 
point  explici'ly 

Crimson  formulated  one  approach  to  the  depth 
mtei polation  problem  |Crimson  81 1  He  suggested 
that  given  a  set  of  scatter'd  depth  constraints 
corresponding  to  points  alo'g  the  zero-crossing 
contours  of  he  rrimal  sketch,  the  surface  which 

'  b"  i '  fits  the  known  constraints  is  th3t  whirh 

pisses  through  the  known  points  and  minimizes  the 
expression  referred  to  as  IV  quadratic  variation  of 
the  surface  He  used  a  graoient  descent  method  to 
find  such  a  surface  but  slow  convergence  rates  were 
observed  in  tm  work 


Thii  research  was  sponsored  in  part  by  ibe  Defense 
Advanced  ftrsnreh  Projects  Agency  under  con' racl, 
N000.I9  84-C-Olfli  and  id  part  cy  an  NSF  Presidential 
Young  Investigator  Award. 


Terzopoulos  worked  further  on  surface 
representation  [Terzopoulos  84|  The  discrete  form 
of  the  risible-surface  reconstruction  problem  is 
described  as  the  solution  of  a  lars.e  sparse  linear 
system  of  equations  The  nonzero  coelficiena  of 
each  equation  is  specified  as  summation:  of 
computational  molecules  Given  the  e'epth 
constraints  and  the  orientation  constraints,  a  *■;.  of 
computational  mol-.uics  emputes  the  nonzero 
coefficients  of  th?  linear  system  by  local 
computations  involving  simple  multiplications  and 
additions  of  nodal  variables  in  a  specified  spatial 
arrangement  Because  of  the  symmetric  nature  of 
the  computational  i  ulecules,  it  can  be  easily  shown 
that  the  resulting  matrix  is  symmetric 
Furthermore.  Terzopoulos  shows  the  stronger  result 
'.hit  the  matrix  generated  >s  symmetric  ana  positive 
d-fimte  (SPD)  The  matrix  is  also  sparse  Even  for 
nodes  which  are  sufficiently  distant  from  a 
boundary  where  the  depth  is  discontinuous,  they 
interact  with  only  12  neighbors,  all  of  them  it 
most  only  2  nodes  away  Terzopoulos  used  a 
multi-grid  approach  with  tue  Gauss-Seidel  re!:.»ation 
it  ethod  at  each  relaxation  sweep  to  speed  up  the 
convergence  rate 

In  this  paper,  we  follow  the  Terzopoulos’ 

formulation  on  visible  surface  reconstruction  and 
use  the  '-nmputational  molecules  proposed  by  him 
However.  we  present  an  alternative  depth 

interpolation  process  using  the  adaptive  Chebysnev 
acceleration  method,  which  spe*ai  convergence  and 
is  amenable  to  certain  cliases  of  parallel  computers 
We  were  led  to  this  form  of  acceleration  By  th* 
observation  of  [Traub  K4l  that  th*  iterative 
methods  on  one  of  which  it  is  bated,  the 

Chebvshev  and  conjugate  gradient  methods,  are 
provable  optimal  in  terms  of  computationai 

complexity 


In  the  previous  section,  the  depth  interpolation 
problem  has  Been  cast  ti  solving  a  set  ot  linear 
equations, 

.dii  •  6  0) 

where  A  is  SPD 

1’sing  one  of  several  known  basic  iterative 

methods,  it  ran  be  solved  by  the  matrix  equation 

ti,"+I)  •  Cm("!  ♦  *.  n  ■  o.i, 3.  ..  (a) 

We  first  review  the  mathematics  involved  in 
ma’rix  iteration  The  intere.tcd  reader  can  find 
further  details  in  [You  Tig  811 


210 


.'J  ■»  •»  TTT5  V 


■.*">? .  j»,*y  ?.  -• .  » 


There  are  several  well  known  basic  iterative 
methods  the  Jacobi  method,  the  Gauss-Seide! 

method,  the  successive  overtaxation  (SQR) 

method,  and  the  symmet  ic  successive  overrelaxation 
(SSORi  method  Hovever,  methods  other  than 
these  basic  iterative  methods  are  used  in  practice 
because  of  the  slow  convergence  rates  of  the  basic 
iterative  methods  Tk*  r  ..es  of  convergence  can  bs 
accelerated  by  two  major  classes  of  acceleration 
polynomial  acceleration  methods  or  uon polynomial 
acceleration  methods  N'oto  that  the  multi-grid 
method  used  by  Terzopouloa  is  one  o(  the 

nonpolynomial  acceleration  methods 


In  our  work,  we  implemented  two  adaptive 
C’hebyshev  acceleration  methods  as  g>ven  in  [Young 
81  j  In  one  algorithm  (Algorithm  6-4  1  on  page 
107'  the  initial  estimate  of  miG),  mr,  «  input  and 
is  not  changed  throughout  computation  The 
estimate  of  A/(C),  Mr.  is  updated  upward  In  the 
other  algorithm  (Algferithm  6-5  1  on  page  P.7), 
e:iimates  of  both  eigenvalues  are  updated  Wnen 
the  initial  mc  is  too  nigh,  it  is  adjusted  downward 
If  not  it  is  jlot  changed  The  other  estimate  Mr, 
is  updated  in  the  same  fashion  as  in  the  previous 
algorithm  As  both  values  approach  their  true 
\Jues,  the  algorithms  rate  of  convergence 
increases 


The  iterative  method  in  12)  is  tfmmetrizabte  if 
for  some  .nonsingular  matrix  W  the  matrix 
\\\l  -  C)W*‘  is  SFL)  If  the  iteration  method  is 
symmetrizaole.  it  can  be  shown  that  the  largest 
eigenvalue  AffG)  of  G  is  less  than  1  0,  this  provides 
us  with  an  useful  upper  bound  for  this  critical 
auantity  on  which  speed  of  convergence  depends 
Furthermore,  other  properties  of  the  matrix  G  turn 
out  to  be  sufficient  for  tue  effective  use  of 
polynomial  acceleration  methods,  such  as  the 
Chebyshec  or  the  conjugate  gradient  methods 

In  general,  the  Jacobi  method  and  the  SSOR 
method  are  symm'trizable  while  the  Gauss-Seidel 
method  and  tb»  SOR  method  are  not  However 
note  that  't  e3n  be  shown  that  the  Gauss-SeicM 
method  always  converges  when  A  ar.J  D  are  SPD 
where  D  is  a  diagonal  matrix  whose  diagonal 
elements  are  taken  from  the  matrix  A 

In  our  implementation,  we  choae  the  Jacobi 
method  as  the  underlying  basic  iterative  method 
since  it  is  much  simpler  than  tLe  SSOR  method, 
another  symmetnzable  basic  iterative  method  In 
particular,  the  sparsity  of  matrix  is  preserved, 
which  lends  itself  efficiently  to  parallel  computation, 
described  in  detail  m  the  next  section  In  the 
Jacobi  method,  the  matrix  G  is  related  to  the 
matrix  A  by 

G  -  (gj  j)  |,  i  -0  tf  1  -  J  (« 

fi.i  ■  V/M  11 1  °  I 

When  A  is  SPD  the,.  Jacobi  method  is 

symmetrizable  with  W  *  />'<* 

In  our  work,  we  chose  the  adaptive  Chebyshev 
method  as  the  method  of  polynomial  acceleration 
Tli*  convergence  rate  ol  (hi*  method  u  fastest 
when  the  largest  eigenvalue,  AffC),  and  the  smallest 
eigenvalue,  .71(G)  of  the  iterat.on  matrix  G  lor  the 
related  basic  method  is  known 

Lee  investigated  using  the  pure  Chebyshev  method 
cn  several  low  level  vision  problems  (Lee  85|  For 
shape  (rom  shading  and  rptical  flow  problems,  he 
calculated  the  lower  ana  uiiper  bounds  of  the 
'irHIest  and  larg-st  eigenvalues  ]n  the  depth 
interpolation  problem  w»  investigated,  we  could  rot 
estimate  the  bounds  ol  eig  r.v.alueg  due  to  the 
flexible  nature  of  the  matrix  This  is  because  the 
matrix  is  sensitive  to  (he  .  r  ape  ol  the  undeilving 
region  and  the  physical  loc..’nns  of  known  depin 
v-j(u*s  For  example  .at  nodes  where  the  depth 
constraints  exist  loth  the  right  and  the  left  hand 
side  of  the  matrix  equation,  Au  =  b,  are  modified 
In  general,  the  optimal  estimates  of  the  »igenvalu*s 
are  not  known  a  priori  but  ran  be  determined  by 
using  the  adaptive  Chebyshev  accele/atn.n  method 


When  we  have  a  case  where  <  Afg  M[G) 
<  10  and  mv  <  m(C);  one  of  thF  ways  to 
compute  the  initial  estimate  of  »n(G)  13  “T 

computing  a  rc-sonable  Icwe*  bornd  based  on  the 
matrix  norm,  that  is,  m(G)  >  -  ||C||oo 

For  the  details  of  algorithms,  reader  is  referred  to 
(Young  81]  In  the  next  section  where  we  discuss 
the  possible  implementation  on  a  parallel 
architecture.  some  computational  steps  of  the 
adaptive  Chebyshev  acceleration  method  will  be 
discussed  in  more  detail 


3.  ImpJen  eptat.on  ca  Parallel 
Architecture 

Here,  the  proposed  solution  demands  three 
characteristics  of  the  supporting  hardware  As  w. 
have  shown  in  the  previous  section  we  use  the 
iacobi  method,  which  is  symmctniable,  to  convert 
the  A  matrix  to  'he  G  matrix  Since  we  use  the 
Jacobi  method  as  the  underlying  basic  iterative 
method,  each  matrix  iteration  should  occur 
simultaneously  Km-  every  node  The  first  prorerty 
then  is  that  it  naturally  leads  to  a  single 
instruction  multiple  data  stream  (SIMD)  mede  of 
execution 

Secondly,  the  matrix  C  is  sparse  In  particular  as 
we  hay*  noted,  in  the  depth  interpolation  problem 
ev*n  a  nod?  (ar  removed  from  the  region  boundary 
•n'eracts  only  with  12  neighboring  nodes  Therefore 
mesh  interconnections  between  nodes  are  sufficient 
for  handling  all  the  communication  deeds  for 
multiplication  by  G 

Thirdly,  what  is  needed  is  a  fast  global  summary 
capability  In  the  adaptive  Chebyshev  acceleration 
method  we  rjJculate  various  matrix  uid  vector 
norms  either  to  test  lor  convergence  or  to  obtain 
better  estimates  ol  the  eigenvalues  This 
communication  need  can  be  met  well  by  a  tree 
topology,  superimposed  on  the  underlying  mesh 

Two  paral'el  archi'ectures  support  these  three 
needs  well  Both  NON-VON  and  the  Connection 
Machine  support  SIMD  control  and  mesh 
interconnections  In  NON-VON,  at"*  topology  •' 
explicitly  supported  see  [Shaw  84|  In  lb« 
Connection  Machire,  a  binary  N-Cube  is 
superimposed  on  the  mesh,  but  a  tree  structure  c^n 
be  simulated  by  software,  such  as  the  calculated 
tree  mentioned  in  [Christm’n  8s| 


h 


s'l 

\l 


l 


N 


f 


220 


i 

\ 

i 

3 

» 

» 

r''- 
».  ' 

« 

m" 

'• 


3.1.  Parallelisation 

The  parallelization  of  the  adaptive  ChebysLev 
acceleration  computation  :s  now  discussed  in  detail 
The  computation  proceeds  in  two  stages:  pre- 
computation,  and  iterations  At  the  pre¬ 

computation  stage,  we  compute  the  matrix  A  using 
a  set  of  computational  molecules  in  CIMD  fashion 
with  four  types  of  given  inpu.3  depth 
discontinuities,  depth  constraints,  orientation 
discontinuities,  and  orientation  constraints  For  each 
nod“,  jt  computes  the  necessary  multiplication 
factors  for  each  of  12  neighboring  nodes  and  itself. 
The  right-hand  side  vector  6  is  also  computed  at 
this  time  Once  the  matrix  A  is  computed,  an 
initial  estimate  of  m((7)  is  also  computed  using  the 
tree  connections  by  calculating  the  sup-oorm  of  C 

iliciw 


At  each  iteration,  computation  goes  through 
several  steps  Here,  our  atte.  'ion  is  focused  on  the 
calculation  of  the  next  iterate  The  major 
computations  are  [Young  81] 


i  *  Cu_, 

cur  cur1 

DELIP  ■  f  I  <511 2  -  DPJI  ■  imi 
B  fi(l if 


nzt 
TUI  * 


prv 


t/  . 


k  -  u 

♦  Unr)  ♦  (1.0  -  p)Ut 

l"r 


prv‘ 


Computation  of  u___,  u  and  u  .  are 
straightforward  Eactfncde  stores  each  of  these 
numbers  so  that  a  simple  SIMD  execution  will 
update  each  one,  independent  of  other  nodes  No 
commumcat.on  is  needed  The  values  p  and  7  are 
iteration  parameters  of  the  Chebyshev  method. 
They  are  determined  by  the  current  estima'~  of  the 
eigenvalues,  mg  and  A/g 


Calculation  of  the  2-norm  of  e,  the  A-norra  of  l  or 
the  7-norm  of  u  .  are  handled  well  using  a  tree 
topology  Usual!/,  the  A-norm  or  7-norm  -s  either 
a  2- norm  or  sup- norm  When  tne  2-norm  is 
needed,  every  element  in  each  PE  is  multiplied 
with  itself  and  the  summation  of  squared  numbers 
is  carried  out  from  the  bottom  to  the  top  01  the 
tree  one  level  at  a  time  The  square  root  of  the 
final  resulting  sum  obtained  at  the  top  is  the  value 
desired  (When  the  size  of  the  mesh  at  the  bottom 
of  the  tree  is  ft  x  n,  the  entire  process  takes  Iqgn 
steps  )  When  the  sup-norm  is  desired,  eaeh  PE 
calculates  the  absolute  v.'iue  of  the  clen:*nt  in,  it, 
anti  then  compares  its  ovn  value  against  those 
values  of  its  own  two  sons  Both  the  comparison 
and  the  retaining  of  the  biggest  value  is  similarly 
carried  out  from  the  bottom  to  the  top  of  the  tree 
The  single  value  obtained  at  the  top  is  the  sup- 
norm  desired  (This  too  is  a  iogn  process  ) 

Finally,  w'  ’re  left  with  the  calculation  that 
involves  the  matrix  multiplication  operation 
Computation  of  the  pseudoresiuual  vector  d  can  be 
done  in  SINfD  fusnion  using  mesh  interconnections 
only  The  only  step  remaining  at  this  point  is  the 
computation  01  matrix  multiplication  term,  Gu 
In  the  previous  section,  we  have  already  seen  tfiTt 
the  Jacobi  method  is  parallel  (1  e  ,  it  simultaneously 
displaces  old  values  with  new  values)  Therefore, 
iterations  based  on  the  Jacobi  method  can  b‘ 
carried  out  in  SIMD  fashion  with  mesh 
interconnections  to  assemble  current  depth  values  of 
neighboring  nodes  Further, more,  all  the  coeflicients 


that  contribute  to  this  issembly  are  in  the  form 
•i, j  * 

since  i  is  not  equal  to  j  Since  the  factor  in 
denominator  a;:,  is  common  to  ail  neighboring 
nodes,  division  by  it  is  done  only  once,  as  the  last 
step  By  us.ng  the  Jacobi  method,  neither  explicit 
pre-computation  of  the  matrix  G  nor  any  particular 
sophisticated  ordering  of  matrix  elements  is  needed 
Put  in  other  words,  only  local  computation 
suppirted  by  mesh  topology  13  all  that  needed  to 
carry  out  iterations  wnen  vector  n  multiplied 
by  the  matrix  G 


4.  Simulation  Results 

We  have  extended  the  existing  NON-VON 
simulator  [Hussein  85]  to  handle  floating-point 
operations  It  was  straightforward  to  implement 
and  test  the  SIMD  control,  mesh  connections,  and 
tree  topology  aspects  of  the  computation  We  later 
simul  ted  the  Gauss-Seidel  method,  which  requires 
only  mesh  connections,  for  comparison 

In  our  preliminary  simulation  work,  the  synthetic 
image  was  a  constant  depth  plane  (u  =  10)  The 
shape  of  the  boundary  of  the  plane  was  a  square, 
with  size  10  x  10  The  depth  constraints  were 
scattered  randomly  over  the  plane  The  density  of 
the  depth  constraints  were  15%,  30%,  and  50% 

For  the  bulk  of  our  simulation  work,  we  used  the 
first  Young  algorithm  (Algorithm  6-4  1  in  [Young 
81])  In  this  algorithm,  the  estimate  of  A  JIG)  is 
updated  upward  during  iterations  but  the  estimate 
of  m(G)  is  not  changed  at  all.  When  the  estimates 
are  not' optimal,  the  convergence  rate  is  not  fast  at 
the  beginning  When  the  estimate  Mu  is  ciose 
enough  to  M(G),  th*  convergence  is  fast  even  in 
the  case  that  mE  is.  not  close  enough  to  m(G\,  in 
the  adaptive  CheDyScev  acceleration  method,  Mu  is 
more  critical  than  mp  The,  jfore,  before 
embarking  on  this  algorithm,  we  carried  out  an 
experiment  to  see  how  much  we  lose  by  running 
this  algorithm  with  the  reasonable  and  simple  to 
.or,L’Ute  initial  estimates  of  Mu  —  0  and  rrtu  — 

-  j|(/]|oo  [Young  81]  In  the  experiment,  we  used  a 
10  x  10  image  where  50%  of  the  nodes  were 
constrained  For  the  initial  value  of  mp.,  the 
calculated  lower  bound  was  -3  0  In  this  case,  the 
algorithm  took  287  steps  to  converge  In  contrast, 
we  ran  the  algorithm  a  second  time,  using  the  best 
problem-specific  estimates  obtainable  -0  4  for  the 
initial  value  of  mp  and  0975  for  the  initial  value 
of  A/p  The  run  ivith  letter  initial  estimates  took 
145  seers,  ic  only  haif  of  time,  to  converge  This 
result  should  re  taKen  as  a  rather  conservative  one 
With  bigger  images,  the  dilb-rence  of  the  number 
of  steps  to  convergence  should  become  smaller,  1  e  , 
the  adaptive  Chebyshey  acceleration  method  arrives 
at  optimal  convergence  rate  relatively  quickly 

(A  note  on  eigenvalues  Tc  get  these  beat 
estimates,  we  used  the  oth»r  algorithm  of  Young 
(Algorithm  6-5  1  17  (Young  81]  In  this  algorithm, 
(he  estimate  of  A /(G)  is  updated  upward  while  the 
estimate  of  m(G)  is  updated  downward  if  current 
estimate,  m(r->  is  bigger  than  the  smallest 
eigenvalue,  m(C)  To  obtain  the  best  estimate  of 
MIG),  we  picked  a  number  that  was  slight  ly 
smaller  than  the  final  estimate  at  convergence  to 
obtain  the  best  estimate  of  m{G),  we  picked  a 
large  enough  number,  say,  -  1  Then  we  ran  the 


221 


algorithm  repeatedly  and  checked  whether  the 
current  estimate  was  upda'ed  downward  while  the 
algorithm  ‘.vas  running  We  continued  this  until 
both  estimates  were  not  updated  at  all  throughout 
the  entire  computation  *Ve  also  verified  that  the 
adaptive  Chtbyshev  acceleration  method  took  the 
smallest  number  of  steps  to  converge  when  these 
best  estimates  were  input  as  the  initial  values  ) 

To  compare  our  approach  with  existing  methods, 
we  also  ran  the  same  image  (10  x  10  with  SOto 
depth-constra  ,t  density)  with'  the  Gauss-Seidel 
method  In  general,  the  depth  values  approximate 
their  final  positions  more  rapidly  in  the'  beginning 
for  the  Gauss-Seidel  method  but  eventually  the 
adaptive  Chtbyshev  acceleration  method  catches  up 
Thereafter,  the  depth  values  are  closer  to  ideal 
values  for  the  adaptive  Chebyshev  acceleration 
method  In  other  words,  eventually  the  Gaius- 
Seidel  method  converges  more  slowiy  It  took  304 
iterations  for  the  Causs-Seidel  method  to  converge 
Recall  that  it  took  2S7  iterations  for  the  adaptive 
Chebyshev  acceleration  method  to  converge,  while 
the  adaptive  Chebyshev  acceleration  method  with 
the  best  estimates  took  145  iterations  to  converge 


4.1.  Convergence  Properties 

To  measure  convergence,  tne  convergence  criterion 
for  the  adaptive  Chebyshev  acceleration  method 
was  applied  to  the  Gauss-Seidel  method  as  well 
The  aeration  error  vector  is  obtained  in  the 
abstract  by  subtracting  the  ideal  depth  vector  from 
the  current  depth  vector  computed  The  iterations 
are  to  be  terminated  wherever  some  norm  of  the 
error  vector  becomes  sufficiently  small  In  the 
adaptive  Ch'bfihev  acceleration  method  the  norm 
cf  the  err  or  vector  is  proportional  to  the  norm  o ( 
the  pseudoresidual  vector,  provided  that  the 
current  estimate,  Aflc),  is  close  enough  to  M[G) 
Often,  a  relative  erfor  measure  is  desired  rather 
tnan  an  absolute  trror  measure  In  this  case,  the 
4-norm  of  the  pseudoresidual  vector  divided  by  the 
e-norm  of  the  depth  vector  is  compared  to  stopping 
criterion  number  instead  cl  the  4-norm  of  the 
ps'-udoreridual  vector  alone  For  more  detailed 
discussion  of  the  convergence  criterion  of  the 
adaptive  Chebyshev  acceleration  method,  the  reader 
is  referred  to  [Young  »1|  Since  we  extended  the 
convergence  criterion  of  the  adaptive  Cheoyshtv 
acceleration  method  to  the  Gau-o-Seidel  method 
the  number  cf  iterations  may  not  be  the  best 
measure  for  comparison 

Thus,  we  locked  into  another  measure  of 
comparative  per'wmance  When  the  Gauss-Seidel 
method  was  sUpp^d  after  the  same  number  of 
iterations  as  the  adaptive  Chebyshev  acceleration 
method  tcck  to  converge,  it  was  observed  that  the 
final  depth  values  obtained  from  the  adaptive 
Chebyshev  acceleration  method  were  closer  to  ideal 
depth  values  of  synthetic  image  dot  most  .«f  nodes 
Returning  to  the  experiment  we  mentioned  before, 
(10  x  19  image  with  Wo  depth-constraint  density  I 
tor  84  nodes  cut  of  100,  th»  adaptive  Chebyshev 
acceleration  method  produced  better  depth  valu'-s 

However  an  ariomalv  of  convergence  was  observed 
as  well  For  the  50^  den.-itv  case  the  atv  raje 
depth  value  was  slightly  further  f.om  the  ideal 
depth  value  for  the  adaptive  Chebyshev  acceleration 
method  It  was  due  to  the  fact  that  some 
remaining  nodes  did  not  convene  v>»ll  for  the 
adaptive'  Chebyshev  acceleration  method  This 


phenomena  was  acre  apparent  when  we  examined 
the  minimum  and  the  maximum  linaJ  depth  values 
When  the  density  got  lower,  it  get  worse  For  the 
Gauss-Seidei  method,  the  converg»nce  rate  was 
rather  slow  but  the  fina1  depth  value*  obtained 
were  much  more  uniform  In  clone  inspection,  it 
was  revealed  that  those  ill-bchaving  nodes  were 
clustered  along  a  row  or  a  column  where  depth 
constraints  diiT  not  exist  '".g  the  line 

The  numerical  value*  are  summirited  in  Table  1 
There,  useful  measures  are  listed  for  three  different 
densities  of  depth  constraints  for  both  the  adaptive 
Chebyshev  acceleration  and  the  Gauso- Seidel 
method  They  are  the  number  of  iteiatic.'is,  the 
sup-ccrm  cf  the  pseudo-residual  vector,  the 
minimum,  the  maximum,  and  the  average  of  the 
final  depth  values,  and  the  final  estimate  Mp  (for 
the  adaptive  Ch-bvshev  acceleration  method  oEy) 


5.  Conclusion  and  Future  Plans 

In  this  paper,  we  showed  how  the  adaptive 
Chebyshev  acceleration  method  can  be  applied  to 
paralfei  computers  fer  those  computer  vision 
problems  where  the  resulting  rVnx  is  SPD 
Basically  the  speed-ups  of  matrix  iteration  have 
been  acbie-ed  by  two  factors  First,  we  used  t 
theoretically  better  method,  i  e ,  the  adaptive 
Chebyshev  acceleration  method  Second,  ail 
computational  steps  have  been  parallelized. 

In  this  paper,  a  synthetic  image  with  constant 
depth  was  described  Oth^r  synthetic  images  will  be 
pursued  in  future  Our  implementation  examples 
were  quite  small  compared  to  real  images 
Currently  our  simulator  runs  on  both  DEC-20  and 
VAX  11/750  By  tiansporting  our  software  to  a 
bigger  machine,  we  will  be  able  to  handle  bigger 
images 

We  have  compared  oir  simulation  result*  with  the 
Gauss-Seidel  method,  our  r  suits  will  aiso  be 
compared  with  the  results  obtained  from  the  multi- 
grid  method  We  also  intend  to  solve  these 
pioblems  using  ue  p”re  iteration  methods  on  which 
the  adaptive  Chebyshev  acceleration  method  is 
based,  namely  pure  Chebyshev  method,  and  the 
conjugate  gradient  method  Thus,  we  will  compare 
these  approaches,  seme  of  which  are  more  suited  to 
■•icular  architectures 


'-V 


V."  • 

r 

N* 


r.< 


r- 

h3 


Acknowledgements 

D  E  Shaw  s  laboratory  has  provided  us  with  the 
simulator  and  the  prototype  machines,  which  has 
made  this  research  possible  G  W  Wasilkowski 
provided  valuable  comments  on  the  mathematics 


e 


I 


if 


222 


>TT. 


References 

[Christman  84]  Christmas,  D  P ,  Programming 
the  Connection  Machine,  Master's  thesis, 
Massachusetts  Institute  oi  Technology,  January 

■ '}« 

[Gr: msor.  81]  Gnmson,  W  EL.  From  Image s  to 
Surfaces,  MIT  Press,  1531. 

JHussein  35]  Hussein,  A  H  I,  Image 
Vr.derUandir  5  A’.uprithms  on  Fine-Grained  Tree- 
Structured  SIMD  Machine*,  Pb.  D  Thesis, 
Columbia  University,  1984 

[Lee  85]  Lee,  D,  Contribution »  to  Information - 
be ied  Complexity,  Image  Understanding  and  Logic 
Circuit  Lesion,  Pf>  D  Thesis,  Columbia 
University.  1983 

[Shaw  34]  Shaw,  D  E ,  Org  motion  and 
Operation  of  a  Massively  Parallel  Machine,  Tech 
report,  Columbia  University,  1C&4 

ITerzopoulce  84]  Terzopoulos.  D ,  Multiresolution 
Computation  of  Visible-surface  Representations 
Ph  D  Thesis,  Massachusetts  Institute  of 
Technology,  January  1984 

[Traub  84]  Traub,  J  F ,  and  Wozniakowski  H , 
Oi  the  Optimal  Solution  of  Large  Linear  Systems, 
Journal  01  ACM  31(3)  543-339,  1984 

(Young  81]  Young,  D  M  and  Haeemeui,  L  A-, 
Applied  Iterative  Methods,  /academic  Prea,  1981 


Result!  from  the  adaptive  Chebyshev  acceleration  method 


iwilt; 

lUft 

Mill*- 

"loin 

“max 

'.VI 

me 

til 

1041 

MWI-t 

-.13080117 

l.imcos 

84147547 

.199079 

Mf 

1171 

T.S!0*l-T 

■  utuut 

1.000m? 

IttMtU 

.808017 

Ml 

1ST 

149851-1 

.99O1M01 

1.000  ms 

.80980704 

.080380 

Results  from  the  Gauss-Seide!  method 


iHilt; 

slays 

1  U 1 1 30 

"min 

"max 

"avg 

151 

1SSS 

1.0810E-8 

.80097084 

1.0000081 

.88909444 

30| 

“981 

7M48E-7 

.08888441 

1.0000111 

.99999494 

Ml 

304 

1.M04E-I 

.08946197 

1.0001079 

.99967541 

Table  It 


Simulation  Results  for  the  10  x  10  image 


First  Results  on  Outdoor  Scene  Analysis 
Using  Range  Daia 

1  ’  ’-tiaJ  Hebert  ’’ac  ,-c  Kanade 

_  and 

Ret  '  Institute  Computet  Sciercc Department 

Carr»eg»e  Mellon  University 
Pittsburgh  PA1S213 


Abstract 

This  paper  describes  roma  techniques  to r  outdoor  scene 
analysis  using  range  data.  The  purpose  of  these  techniques  is  to 
build  a  3  D  representation  ol  the  environment  ol  a  mobile  robot 
equipped  with  a  range  sensor  Algorithms  are  presented  lor 
scene  segmentation,  object  detection,  ana  map  building. 

We  present  results  obtained  in  an  outdoor  navigation 
environment  in  which  *  laser  range  tinder  is  mounted  on  a 
vehicle  These  results  have  been  applied  to  the  probiem  ol  path 
planning  through  obstacles. 

1 .  Introduction 

This  paper  presents  some  techniques  developed  lor  outdoor 
scene  analysis  using  range  data,  the  advantages  nl  using  such 
range  late  (all  in  two  categories  First,  range  data  is  less 
Te'iMhvo  to  em/ironme  dal  conditions.  such  as  lighting,  thus 
alleviating  rjiado.v  or  highlight  problems  Second,  range  data 
directl/  provides  ir.iormation  about  the  geometry  ol  the  scone, 
such  at  the  position  ol  a  particular  feature,  thus  suppressing  the 
ct.libiation  ai  d  registration  slops  requued  witli  camera  data. 
This  prcporiy  is especially  irnpod  int  m  the  area  ol  mobile  robots 
in  which  the  output  ot  the  vision  (vograms  nuist  he  converted 
into  usable  space  coordinates.  For  our  work,  wc  use  the  raw 
laser  range  sensor  which  provides  reasonaole  accuracy,  a 
working  space  largo  enough  tor  outdoor  applications,  and  high 
acquisition  speed 

This  research  Is  supported  uv  the  Pcfenae 

Advanced  Research  Projects  Agency  (DOG) 

and  monitored  by  the  office  of  Naval  Research 

under  Contract  NGOOt 4-82-K-01 93  and  by 

contract  DACA76-85-C-0003  issued  'ay 

the  l'.  S.  Army  Engineer  Topographic 

Laboratories. 


Following  the  description  ot  i he  hum  range  sensor,  we  present 
preprocessing  techniques  lor  removing  sensor  dependent 
defeu'ts  in  section  3.  to  section  4.  we  present  the  3-D  features 
extraction  algorithms  which  are  designed  to  produce  relevant 
feature;  (or  outdoor  veh>ci«  navigation  and  object  recognition. 
3-D  map  reconstruction  from  ,ange  data  is  described  in  section 
i 

2.  Sensor  Description 

The  range  sensor  we  use  for  outdoor  scene  analysis  has  been 

designed  by  the  Environmental  Research  Institute  ol  Michigan 

and  wit  be  relerrod  to  as  the  emu  sensor.  The  basic  principle  ol 

the  sensor  is  to  determine  the  range  from  the  sensor  to  the  scerto 

pom;  lor  each  pixel  by  measuring  the  transmit  time  nl  a 

modulated  laser  beam.  The  transmit  t.me  is  derived  by 

measuring  thj  phase  difference  between  the  reference  and 

reflected  signals  which  corresponds  to  the  range  from  the  sensor 

to  Iho  target.  A  two  mi.rors  scanning  mechanism  directs  the 

beam  onto  the  scene  so  that  an  image  ol  the  scene  is  produced. 

in  the  ru  n  aiv  version,  the  lick!  ,.f  view  is  s4tl°  in  the  horizontal 
.i  o  o 

piano  aixJ  At  in  the  v.‘il*cal  plane,  !rom  15  to  -15  .  Iho  resuming 

range  is  a  i**  x  t;it  image.  Tho  frame  rat.?  is  currently 

two  images  \yjt  sec'ird  (1/2  sec  ond/f  fame).  The  nommat  range 

noibC  0  1  Irrci  at  :0  feet. 

Since  only  the  phase  sh*f‘  nwasured.  tho  resulting  values  are 
rclat'V’j  instead  ot  absolute  measurements  That  <3.  two  points 
separated  by  a  length  .“qua!  to  a  complete  phase  shift  have  tho 
same  range  *.  lino.  Thts  cnNoal  length  is  called  tho  ambiguity 
interval  and  is  equal  to  04  foot. 

The  sense*  is  also  capable  ol  producing  reflectance  images  in 
which  tho  viitiii?  ol  each  p.xrl  is  the  amount  ol  light  reflected  by 
the  targe  a  T  hir.  information  has  not  been  used  yet. 


V  w‘- 

l "  . 


V  Vi 

•1 


224 


^****3*^  >yr' .-.wr^'1  '9^JS!^P'M||^■,-^*^WF^3^^a.^,  \Tgy*‘W»**riJI  .  y  ,J  :| ^■^IWWWMUWJ»‘P* MffdyM 


Figura  2- 1  shows  a  sequence  of  eignt  erim  images  taken  in  a 
pai^  two  consecutive  images  are  taken  from  positions  separated 

by  cocrGvimate'y  five  meters. 


Figure  2-1:  A  sequence  of  erim  images 

* 

3.  Preprocessing 

The  frim  data  includes  a  periodicity  problem  due  to  the 
ambiguity  interval.  The  periodicity  is  especially  apparent  in 
images  such  as  the  one  shewn  in  figure  3-1  in  which  distant 
points  have  the  same  value  as  dose  points.  This  problem 
reduces  the  range  at  which  tite  scene  ca>.  be  processed  to  the 
extent  of  the  ambiguity  interval,  and  may  also  create  false 


features,  such  as  false  edges  which  do  not  correspond  to  any 
ohyaical  feature.  Therefore,  the  first  step  in  the  ef'm  image 
processing  is  to  remove  the  periodicity.  The  periodicity  removal 
algorithm  has  three  steps: 

•  Divide  the  image  into  cennec.^u  components  so  that 
two  points  whose  range  difference  is  greater  than  a 
threshold  are  never  connected  (figure  3  2;.  Two 
such  points  belong  to  t"'0  different  ambiguity 
intervals. 

•  Remove  the  small  regions  which  correspond  to 
noise. 

•  Explore  the  graph  of  components  starting  at  the 
bottom  region  which  i»  within  the  first  ambiguity 
interval.  During  the  exploration,  add  an  offset  to  all 
the  points  of  the  currently  visited  region.  Initially,  the 
offset  is  zero,  then  it  is  incremented  by  256  each  rime 
a  region  above  the  current  region  ir.  visited.  The 
result  of  the  correction  is  snewn  in  figure  3-3. 

We  have  found  that  this  algorithm  works  well  within  the  two  first 


Figure  3-2:  Ambiguity  inten/ate 


Figure  3-3:  Corrected  imaje 


P*  •W-'* 

C  ' 


X" 


ambiguity  intervals  Beyond  that  point,  measurements  are  usually 
mo  noisy  to  ensure  reliable  results  No  algorithm  is  guaranteed  to 
retrieve  the  actual  range  values  since  it  is  unknown  whether  two 
regions  arc  <=eparated  by  only  one  or  several  ambiguity  intervals. 
The  algorithm  assumes  that  only  one  interval  separates  two 
regions. 

4.  3-D  Features  Extraction  anti  image 
Interpretation 

In  this  section,  we  describe  the  feature  extraction  techniques. 
The  basic  features  relevant  to  the  outdoor  navigation  problem 
are: 

•  3J2  ednes  which  correspond  either  to  discontinuities 
of  depth  such  as  the  boundary  of  an  obstacle,  or  to 
discontinuities  of  surface  normalssuch  c  j  the 
boundary  between  a  flat  region  and  a  highly  turned 
region. 

•  Smooth  regions  which  have  a  low  curvature.  These 
regions  are  further  divided  into  accessible  regions 
and  obstacle  regions. 

•  Obstacles  which  are  either  high  curvature,  ground 
regions,  or  objects  divided  into  smooth  or  pseudo- 
planer  regions. 

The  output  structure  from  this  feature  extraction  process  is  a 
connectivity  graph  of  it:3se  features.  The  feature  extraction 
proceeds  by  fiist  computing  low-level  a  .tributes  (edges,  normals, 
and  curvaturc.3)  from  the  image,  and  then  merging  the 
segmentations  derived  from  these  attributes  into  a  single 
description  (figure  4-1). 

4.1 .  Surface  No. mats 

The  surface  normals  provide  important  pieces  of  information 
about  the  shape  of  the  observed  terrain.  The  best  way  of 
computing  the  surface  normals  is  to  approximate  the 
neighborhood  of  each  pixel  by  a  plane.  A  stranhtforwa.’d 
method  would  be  to  minimize  for  each  pixel: 

UigT  -OII1  (1) 

where  N  is  the  size  of  the  neighborhood,  v  is  the  surface  unit 
normal,  D  is  the  normal  distance  between  the  origin  and  the 
plane,  a  .  are  weighting  factors,  and  /?  are  the  measured  points. 
Although  simple,  this  procedure  is  time-consuming.  Moreover,  it 
does  not  take  into  account  the  fact  that  the  erim  sc  saner  Jelivers 
the  radial  distances  instead  of  the  Cartesian  coc  .mates  The 
alternative  and  preferred  criterion  is: 

X>J|?-u-V  (2) 

is*  i 

H 


where  ?  =  vV  l),  f  is  ihe  radial  vector  at  pixel  (i,j),  and  d .  is  tho 
distance  from  pixel  (i.j)  to  the  origin. 


The  solution  of  (2)  is  given  by: 


ij^N  j 

ii 


Where  M  is  the  3  x  3  matrix  echoed  by 


O 


(3) 


Since  the  vectors  depend  oniy  on  the  scanning  parameters 
of  tne  sensor,  the  matrix  M  can  be  computed  betorehand. 


Figure  4- 1 :  Feature  extraction  process 

Actually,  the  vectors  and  matrices  depend  also  on  the  oiientation 
of  the  sensor  (pan  and  tilt  angles),  but  their  value  can  be  updated 
easily:  if  the  sensor  is  rotated  by  a  rotation  K.  then  the  radial 
vectors  ii'and  the  resulting  normal  3’  are  changed  lo  tfVand  R'V, 
respectively. 

In  summary,  the  estimation  of  ttie  surface  normals  proceeds  as 
follows: 

•  If  the  orientalion  ol  the  sensor  has  been  changed 
since  tho  last  image,  update  lie  vectors  u*s. 

•  Correlate  the  inverse  image  1  /</  with  if  with  weights 

V 

•  Multiply  the  resulting  vector  image  by  M  '. 

»  Normalize  the  resulting  vec  tor  to  obtain  Die  unit 
surface  normal. 


226 


Fiquie  4-2  shows  a  range  image  and  the  three  components  of 
the  surface  norma!  estimated  by  using  a  S  x  5  window. 


'■  T  ■* 

Figure  4-2:  Original  raiige  image  and  surface  normals 

4.2.  Curvatures 

4.2.1  Using  Principal  Cuivaturcs 

Several  auihois  have  shown  that  dillerential  geometry,  namely 
the  theory  ol  pnncipai  curvatures,  can  be  used  to  recover 
propr.itir-s  ol  a  surface  observed  (or  a  range  sensor  (6,  C).  These 
properties  range  from  the  extraction  ol  roof  edges  to  the 
extraction  of  cylindrical  surfaces.  However,  these  techniques  do 
not  work  well  for  outdoor  imagery  the  pereqoisites  ol  these 
techniques  are  that  an  accurate  estimation  of  second  order 
differential  attributes  is  possible  and  that  the  surfaces  are 
mathematically  well  delined.  While  outdoor  imagery  has  a  limited 
accuracy,  and  the  observed  surfaces  usual  y  do  not  have  a  well- 
defined  mathematical  representation  snee  in  a  natural 
environment,  most  surfaces,  such  as  a  gr.issy  terra.n  and  tree 
foliage,  are  highly  textured  and  irregular,  Therefore,  we  limit 
ourselves  to  the  computation  of  curvatures  tor  the  purpose  of 
roughly  segmenting  the  scene  into  separate  regions,  each  o( 
which  is  a  region  ol  low  and  unilorin  maximum  curvature. 

4.2.2.  Computing  Curvatures 

The  computation  ot  principal  curvatures  can  be  reduced  to  ihe 
computation  of  first  and  second  derivatives  of  the  image  tsee 
[2}  for  a  complete  definition)  The  standard  way  of  computing 
the  curvature  is  to  deline  the  range  image  as  ?  unction 
Z = f[X.  O,  where  'he  two  coordinates  A'  and  )'  are  assumed  to  be 


uniformly  distributed  along  the  u-age  rows  and  columns.  This 
assumption  is  not  true  when  using  the  prim  sensor  because  of 
the  width  of  the  field  of  ,i>  The  solution  is  to  use  me  spherical 
representation  f ,  where  p  is  a  measured  point  on  the 


i'  tnw  ognoui, 


a  is  me  rar  ge  given  oy 
radiaJ  vector  of  angles  y  and  8  A  surface  is  considered  as  a 
parametric  surface  ~p-  1  he  curvatures  can  be  computed 

by  using  the  derivatives  with  respect  to  die  two  angles: 

bf  bd  olX’p.S) 

r—  =  —  3<p.8)  +  d— - -  (4) 

3<p  dv  d<f 


37*  3 J 

3  9  it 


3  .j<p,8) 
3  0 


..etc 


The  radial  vectors  tX<p.O)  depend  only  on  the  characteristics  of 
ttie  sensor  and  its  orientation.  These  vectors  can  therefore  be 
Computed  brlo/ehond.  The  curvatures  computation  is  thus 
reduced  to  the  computation  of  live  first  and  second  derivatives  of 
the  image  ./with  respect  to  the  angles  f  and  0.  This  computation 
is  done  by  first  applying  a  Gaussian  smoothing,  and  then 
computing  the  derivatives  by  convolving  with  J  x  J  masks.  These 
masks  are  derived  from  a  second  order  approximation  jf 
p=  Hf.i).  In  summary,  the  curvature  computation  algorithm 
proceeds  as  follows: 

•  Apply  a  Gaussian  smoothing  on  the  range  image. 

•  Compute  the  derivatives  of  the  range  image  with 
respect  to  the  two  spherical  anges  uy  applying  )  x  3 
operators. 

•  Derive  the  derivatives  of  the  three  Cartesian 
coordinates  by  using  equations  (4). 

•  Derive  tho  curvatures  by  applying  the  fundamental 
forms  equations  [2). 

figure  4-3  shows  a  range  image  and  the  corresponding 
maximum  curvature  image  estimating  by  using  the  above 
algorithm. 

4.3.  Edge* 

bdgus  are  computed  by  detecting  the  depth  jumps.  This  is 
cone  by  first  detecting  ihe  rero  crossings  of  a  difference  of 
Gauss-an  masks  in  tne  depth  image.  One  difficulty  .s  that  it  2 
chi- els  are  tar  from  the  sensor  they  may  appear  as  neighbors  in 
the  image  even  tliouqh  they  may  be  tar  apart  in  space  and  hence 
do  not  correspond  to  edges.  This  problem  is  overcome  by 
considering  only  points  of  the  image  with  sufficiently  high 


221 


►  -J  w  J' '  ■ .  V 


Figure  4-  3:  Ongi,.ai  range  mage  and  curvature  mage 

curvature  as  potential  edge  pomts.  This  algorithm  detects  main*? 
the  lump  edges  corresponding  to  the  boundaries  of  vertical 
obstacles  A  more  elaborate  -Mge  tinder  tor  detecting 
discontinuities  ef  the  surface  n  not*  is  based  on  region 
segmentation. 

4.4.  Segmentation  Algorithm 

The  segmentation  algorithm  proceeds  by  combmmg  partial 
0  segmentations  obtained  from  aftrfcutes.  such  aa  edges,  eurfaca 
normals  and  curvatures,  mto  a  consaseM  scone  segmentation 
(figure  4-1)  The  advantage  of  the  approach  e  that  4  takes  into 
account  all  the  available  information  while  dividing  the  whole 
segmentation  problem  mto  smaller  ones 

The  initial  segmentation  are  produced  for  each  attribute.  3- 
components  surface  normals  and  curvatures  by  a  three  steps 
region  growing  algorithm: 

1.  Find  clusters  m  the  attribute  space  such  as  dusters 
in  the  surface  normals  space. 

X  Identify  the  regions  corresponding  to  those  dusters 
m  the  original  image. 

.">  Ilv  'h.'-a:  Iiijinn:  as  On  tun)  rryions  lor  1  rrsjicn 
giowing  algorithm  the  -V.ji-s  are  used  as  region 
bound, tner  in  llio  region  growing 

Mu.'  vgino.itjl.oiis  obtained  tfi;m  individual  ji|i.butes  are  then 
ir«.ei(i  d  togellief  thri  meriting  step  compares  eacli  r*->nxi  ol 
one  segmentation  In  the  corresponding  ri-gipn  in  the  image  of 
the-  other  segmentations  if  the  segmentations  .igo'-e.  the  region 
is  rotiorted.  otherwise  it  is  v>M  mlo  connected  n-,ons  consistent 


>  .  «  ,s  ,s  .*.  ■.  -.  (  .  .  .  .  ,  t  .  -.  -.  -. 
V  s'\’  \  \ 


wih  the  other  segmentations  Figure  4  4  shows  the 
segmentation  of  a  short  sequence  of  cum  images  The  regions 
are  Libeled  as  i/noorh,  re  .  accessAife  portions  of  the  terrain,  and 
ob st  ie.  i  e  .  obyects  m  the  scene.  The  first  im nopal  direction  of 
a  region  cs  attached  to  the  region  m  the  display. 


5.  Surface  Reconstruction  and  3-D  Map 
Building 

In  the  pnmcMS  section  we  presented  techniques  lor  extracting 
information  lor  emm  images  In  this  -sc lion,  we  present 
techniques  lor  storing  this  information  aa  a  local  3  D  map  of  the 
•mnronment  A  map  •  a  structure  describing  the  geometry  of 
the  environment  from  which  one  can  derive  information  such  aa 
the  type  of  trram  at  a  given  apace  location  xj.i. 

3. 1 .  Local  Map 

One  important  characteristic  of  a  range  mage  »  that  it  permits 
the  production  of  a  three  dimensional  map  of  the  current  local 
enwonmen*.  Such  a  map.  called  a  local  map,  e  derived  from 
only  one  image  and  can  be  viewed  as  the  local  state  of  the 
environment.  This  map  can  m  turn  be  used  to  predict  the 
appearance  at  the  scene  from  mother  viewpoint,  and  to  plan  a 
sale  path  for  a  vehicle  while  taking  into  account  the  3-D  shape  of 
die  L-aversed  terrain.  The  path  planning  ts  especially  important  in 
cross  country  navigation  where  the  'flat  ground*  assumption 
does  not  usually  hold. 

Figure  5-1  shows  an  example  of  such  s  map  arid  the 
corresponding  mm  image  The  map  is  displayed  as  a  3  0  mesh 
of  mr  .loured  poets  wiih  me  si'tluce  normui  at  each  point. 

3  2.  Global  Map 

AJI  the  techniques  described  so  lar  proceed  by  .nbependentfy 
processing  one  mage  at  p  time  In  an  outdoor  navigation 
system.  consecutive  images  are  related  to  each  cl  her  to  rlevetop 
a  global  map.  That  is.  the  robot  grabs  an  image  every  1  to  10 
meters  as  m  the  sequence  shown  in  tigu'e  2  1  then  each  image 
is  registered  with  respect  to  the  previous  ones  In  other  words, 
we  rrv_rge  the  local  maps  produced  by  each  individual  image  mto 
a  global  map  describing  the  environment  cxptc'ed  so  far  Such  a 
global  map  can  be  useci  lor  two  mam  purposes: 

«  >'  ye-  -  'iQi  pn  rt  l.-nempor  A  single  image 

provides  rnly  p.uliai  mioimalion  about  the  identified 
oteects  r  nrjrnple  only  the  front  pari  ol  the  tree 
is  v.sibie  in  (iguie  5  1.  end,  as  a  result  no  i, formation 
u  available  in  the  cone  shaped  unknown  region 
behind  it  Puttmq  together  5t*v*eal  local  maps 
obtained  Iroin  several  dillr.ient  .ewpoirus  would 
relme  the  ook  s:s  1*  senphun  by  reducing  the  sirs  of 
the  unknown  n-jions 


w 


t 


*23 


.  ■  T5jr  %•  •  •-  .  ♦  ,  • 

JjjJA 

•y^nni1  /t  r>'.'//  '/> 


,  // 


'-•>Z££Z&Z? 


jig# 


t"QOti< 

-«  ..)«.  •  *»v. 


Figure  5-1:  Local  map  derived  (rom  an  EITIM  image 


.  x  . 


Figure  4-4:  Segmented  sequence 


•  rpn%fFC<rf/n>7  of  .1  fo/r>FF»F)rc  F»\fp  On«J 
implication  of  outdoor  vision  u  the  n>ploration 
r,c:- njr io  A  vehicle  equipped  with  r/,*noof$  discovers 
an  unknown  environment  and  -  tores  information  in  a 
r^f  -fence  rii.ap  us.ibie  <'unny  later  miomons. 

‘  -1  1-uMuvi  ii  cnnstrl.r.-d  .)•■  ,i  m  itching  process 

•  n  «  (»ri-.<  •(  ir»;t«jt^; 

’  "’  It  '■  ■■in.-  I  is  p.irl  nl  Hi..  il  map  ;uwl 

”  •  I'-:  s'  in  sms,  ijloli.il  connJiri.il..  system. 


*r  '■  T  '•s.tnri'",  from  I r , i it'** 
"  “  i.  'i'ijns  .■■■  < 

>..*.11.11  i.h  tins. 


>  fisi;i!i,",  ,;re 
i;  proxini.it.rd  by 


3  Match  the  features  ol  images  I  and  2  by  using  a  tree 
search  procedure  guided  by  Ihe  transformation 
estimation.  This  procedure  is  similar  to  Ihe  ones 
described  in  [1 J  and  [3). 

4  The  matching  provides  an  estimate  of  the 
transformation  between  images  I  and  2.  which  in 
turn  provides  an  estimate  of  the  position  ot  image  2 
with  respect  to  the  global  map, 

5  Include  the  local  map  derivad  from  image  2  into  the 
global  map  That  involves  the  ident.  .cation  ot 
overlapping  regions  and  the  updating  ol  Objects 
descriptions. 

We  have  done  experiments  on  steps  1  through  4  using  crim 
data  Figures  5  2  and  5  3  show  two  consecutive  frames  The 
natch. ng  ot  the  two  images  leads  to  a  trans/orrnation  estimate 
which  is  applied  to  the  second  local  map  Figure  5  4  shows  the 
two  registered  local  maps  Finally,  overlapping  reqions  are 
identified,  thus  leading  to  tr.e  updated  map  ol  figure  5  5  In  thrs 
example  the  map  updating  has  been  local  at  the  p.xet  level  and 
does  not  r.c'ude  the  updating  ol  the  symbolic  description. 


2  2  9 


Matching  the  features  is  efficient  since  the  number  of  features 
is  usually  small,  and  the  transformation  between  images  is 
partially  know.;  hefrvciiand.  That  is,  bounds  on  the  disolacement 
between  two  consecutive  hames  are  av a. labia. 

6.  Conclusion 

The  techniques  presented  in  this  paper  have  been 
experimented  in  a  realistic  outdoor  snvirnnment  by  mooring  the 
sensor  on  a  mobile  robot  [4],  The  results  indicate  that  active 
range  data  processing  is  suitable  for  the  navigation  through  an 
unknown  environment.  Future  work  includes  the  combination  of 
range  data  with  other  sources  of  vsual  data,  such  as  reflectance 
or  color,  and  the  3-D  object  recognition,  that  a.  the  idendncstion 
oi  specific  objects  in  the  scene  by  matching  extracted  feature# 
and  scored  models  The  combination  of  these  techniques  w* 
provide  a  powerful  system  for  outdoor  scene  interpretation. 


Acknowledgments 

Mike  Black  well  installed  the  hardware  and  software 
environment  of  the  erim  sensor.  The  cmu  Choi  Engineering  Lab 
provided  the  testbed  vehicle.  Tanayosni  Obataae  implemented 
part  of  thn  3D  map  builder.  The  authors  v-ih  *o  acknowledge 
the  ck'.j  if  age  Understanding  group  for  helpful  discussions  and 
support 


References 

(1)  Faugeras  OD.,  Hebert,  M. 

A  3D  recognition  and  positioning  algorithm  using 

geometrical  constraints  between  primitive  surfaces. 

In /’roc.  Liglh  tut  Joint  Coni  On  Arhticial  Intelligence, 
pages  990-1002  Kartvuhe.  August,  1963. 

{21  Faux,  I  D..  Pratt.  MJ. 

Computollionjf  Geometry  lor  Design  and  Manufacture. 

Ellis  Horwood.  1979. 

(3]  Hebert.  M  .  Kanade,  T. 

The  3  D  Profile  Method  for  Obtect  Recognition. 

In  Proc  CVPR' S3,  pages  456-434.  San  Francisco.  June, 
1965. 

(4]  Kanade.  T„  Thorpe,  C. 

CMU  *t-V  Po/ecl  Report  1984  to  198 5. 

Technical  Report  Forthcoming  Technical  Report, 
Carnegie  Mellon  University,  1066. 

(5]  Meaiooi.  G.,  Nevatia,  R. 

Description  of  3  D  Surfaces  Using  Curvature  Properties 

In  P’oc  Image  Understanding  Workshop,  amps, 
Science  Applications.  McLean.  Va  .  1964. 

[6J  Ponce.  J  .Brady  M. 

Toward  a  Surface  Primal  Sketch. 

In  P’nc  intern  Coni  on  Robotics  and  Automation.  St 
Louis.  March,  1965. 


231 


DESCRIPTION  OF  SURFACES 
FROM  RANGE  DATA1 

TJ  Fan.  G.  Medionl  and  R.  Navatia 

Intelligent  Systems  Group 
Departments  of  Electrical  Engineering 
and  Computer  Science 
University  of  Southern  California 
Loa  Angeiea.  California  90089-0273 


ABSTRACT 

Curvet'ire  properties,  if  known  at  every  point  on  the 
surface,  completely  specify  the  surface.  We  argue  that 
extracting  jignificent  curvature  changes  leads  to  t  rich 
end  robust  surface  representation.  Since  curvaturv  com¬ 
putation  is  a  highly  noise  sensitive  operation,  we  smooth 
the  image  with  masks  of  increasing  site,  detecting  fea¬ 
tures  at  the  smoothest  level  snd  localizing  them  at  the 
original  one  Our  representation  consists  of  a  list  of  labels 
that  are  computed  from  the  curvature  properties  but  that 
correspond  to  significant  physirst  properties  of  the  surface 
such  as  lump,  folds,  and  extrema.  We  illustrate  our  tech¬ 
nique  with  several  examples. 

1.  INTRODUCTION 

We  era  interested  In  the  description  of  3-0  surfaces 
and  oblects.  assuming  that  range  data  (l.e.  the  3-0 
positions)  of  the  points  on  the  visible  surface  are  available, 
say  by  the  use  of  a  laser  range  finder.  We  else  assume 
that  this  data  it  ’dense*,  in  the  sense  of  being  sampled  on 
a  certain  grid  and  not  |utt  at  discontinuities  (as  may  be 
the  case  for  uninterpolated  edge-based  stereo  data). 

To  generate  usefid  descriptions,  we  need  a  useful 
representation.  In  general,  such  a  description  should  be 
suitable  tor  the  task  of  obiect  recognition  and  position 
identification  it  should  be  rich,  to  that  similar  obiectt  can 
be  idaotif  ed.  ataMe.  to  that  local  changes  do  not  radically 
a*t er  the  <  ascriptions,  and  have  local  support  so  that  par¬ 
tially  visible  obiectt  can  be  identified,  it  should  else  en¬ 
able  us  to  recreate,  from  its  features,  a  shape  reasonably 
close  to  the  original  one. 

Generalized  cones  have  come  to  be  recognized  as  sn 
important  class  of  representations,  that  satisfy  the  above 
requirements,  particularly  for  complex  objects  which  are 
described  as  assemblies  of  smaller  obiectt  (1.  21 
However,  generalized  cones  are  ’volume*  descriptions  snd 
may  not  be  suited  for  objects  that  are  assentiaify  surfaces, 
such  at  a  metal  sheet,  or  for  relative  smooth,  ’featuraless* 
surf  acts  such  as  a  turbine  blade.  In  this  paper,  our  inter¬ 
est  is  pnmariiy  m  the  description  of  surfaces,  though  we 
think  that  such  descriptions  will  also  be  an  an  important 
tool  m  generating  generalized  cone  descriptions. 

't*,.  rMM.cn  ...  .oOMCtWl 

CMUMI  .«««  »3J4'VS»-X-I40».  monrlor.*  m 
IIM  tore*  Wfi»M  Wvrt.nctl  Lxoof O'Mr  H»  1119 


it  is  well  known  ui  differential  geometry  that  the  infor¬ 
mation  given  by  the  magnitude  and  the  orientation  of  the 
principal  curvatures  determines  a  surface  completely  end 
uniquely.  Our  interest  in  this  work  lx  to  use  this  data  to 
Isolate  important  ptiysicai  properties  of  a  surface. 

In  particular,  we  are  interested  In  the  following  •  -irtac* 
properties: 

1.  jump  boundaries  where  the  surface  umi er;of.  a  dls* 
continuity 

2.  fold i  which  correspond  to  rurtace  orientation  dis¬ 
continuities 

3.  ridoe  lines  which  correspond  to  smooth  local  ex¬ 
treme  of  curvature 

We  will  show  that  these  surface  properties  can  be  In¬ 
ferred  from  'zero-crossings*  and  extramal  values  of  sur¬ 
face  curvature  properties. 

Curvature  properties  have  been  used  to  describe 
curves  in  the  past  by  several  authors,  e  g.  see  {3,  4).  In¬ 
terest  in  describing  surfaces  m  3-0  is  more  recent  per¬ 
haps  because  of  unavailability  of  good  range  data  in  the 
distant  past.  Many  authors  have  concentrated  on  point- 
wisr  descriptions  of  surface  properties  rather  than  extract¬ 
ing  charactaristlc  features  [5.  Al  Our  work  is  similar  to 
that  of  Ponce  and  Brady  (7).  but  differs  in  detail.  This 
paper  Is  an  update  of  the  approach  we  presented  earlier 
[8)  and  describes  new  implementations  and  results. 

The  next  section  presents  a  review  of  previous  ap¬ 
proaches  to  surface  representation,  section  3  presents  our 
approach  In  detail,  section  4  shows  results  for  various 
images  and  section  5  outlines  future  research  directions. 

2.  kLTERNATTVI  approaches  to 
SURFACE  DESCRIPTION 

A  recant  survey  paper  by  Jain  and  Beal  [9i  constitutes 
sn  excellent  overview  of  the  field  of  range  image  analysis. 
We  can  chareoerize  previous  approaches  to  3-D  surface 
descriptions  in  the  following  classss: 

-  approximation  by  'simple*  surfaces,  such  as  planar 


patches 


-  extraction  of  'Mgti'  in  rang*  data 

-  surface  charnel  artzstion 


2.1.  Simple  aurfacits 

Surfaca  rap.asantation  ha*  been  important  in  computar 
graphic*  almost  from  th*  start.  Th*  aartiast  approach** 
usad  .pproximation  by  planar  oatchas  (typically  triangular), 
latar  method*  haw*  usad  othar  surfaca  patch**  such  as 
‘Crons  patch***  and  2-D  spline  fits.  Wa  baliava  that  such 
mathods.  whil#  aduquata  for  graphics  applications,  ar* 
i-'har  poor  for  computar  vision,  as  tha  appro.'umatmg  sur- 
facau  aro  subiect  to  larg*  changas  avan  If  tha  obsarvad 
surfaca*  change  only  slightly.  Th*  number  of  patches 
found  is  typically  v/ry  larg*  and  the  points  and  lines 
where  tha  approximating  patches  ar*  joined  need  not  have 
any  physical  significance. 

Some  examples  of  using  surface  approximations  for 
computar  vision  applications  can  be  found  In  [1 OL  till 
[12L  [13]  [14]  and  [4 1 

2.2.  Edge* 

Another  approach  l*  to  >'ind  ‘edges*  In  surfaces 
directly.  Th*  jump  boundaries  can  be  easily  located  by 
specialized  operators  vised  for  intensity  ailga  detection. 
Th*  mor*  difficult  problem  is  th*  detection  of  smooth 
adgas  such  as  at  folds.  Successful  mathods  have  been 
developed  for  detecting  that*  features  whan  they  can  be 
modeled  at  the  intersection  of  two  planar  patches  [15.  16L 
or  to  hypothesize  th*  pracanc*  of  known  obiacts  In  th* 
scan*  [17],  These  features  ar*  than  used  In  conjunction 
with  othar*  to  generate  a  useful  description. 

2.3.  3-0  surface  cheracterlzstion 

Th*  methods  to  characterize  surfaces  can  be  viewed  at 
being  m  two  sub-classes.  Tha  first  describes  tha  surfaca* 
by  point-wisa  propsrttss  whtrsat  tha  othar*  attempt  to 
dative  global  descriptions.  Several  previous  authors  have 
described  mathods  of  characterizing  surfaces  by  using 
curvature  but  not  all  war*  aiming  tor  global  descriptions. 

Lin  and  Parry  [18]  argue  that  Integrals  of  th*  scalar 
curvature  of  a  surfer,  provide  good  thso*  information. 
Tha  computation  of  thas*  quantities  is  don*  using  surface 
tnangulation.  No  experimental  results  ar*  given  in  thair 
paper. 

Sethi  and  Jayaramamurthy  (6)  first  compute  th*  surface 
normal  at  each  point,  than  extract  characteristic  contours, 
which  ar*  sets  of  point*  for  which  surface  normals  ar*  at 
a  constraint  inclination  with  a  rafaranc*  vector.  Under¬ 
lying  surfaces  ar*  thar.  identified  by  taking  a  Hough  trans¬ 
form  of  thas*  contours.  Tnis  approach  works  wall  for 
simple  surface  (cylinders,  cones). 

Laffay,  Haraiick  and  Watson  [19]  fit  at  each  pixel  a  two 
dimensional  cubic  polynomial  to  estimate  th*  first  and 
second  partial  derivatives  at  that  point.  They  than  com¬ 
pute  feature*  to  c'assify  each  pixel  into  class**  such  as 


peaks  ov  ridgaa.  Th*  output  is  still  a  dense  representation 
on  which  feature  extraction  has  to  be  performed. 

Langridg*  (20)  give*  th*  result  of  a  preliminary  inves¬ 
tigation  into  th*  problem  of  detecting  and  locating  discon¬ 
tinuities  in  th*  firat  dsnvattvea  of  surface*.  Reault*  are 
shown  only  on  2  iimpl*  synthetic  »x*mpi*r 

Nackman  [21]  propose*  to  axtract  peaks  (local  maxima), 
pita  (local  minima)  and  passes  (saddle  points)  of  a  surface, 
to  connect  these  features  and  obtain  a  graph  which  ap¬ 
pears  useful  to  describe  variations  of  a  smooth  surface. 

Flnelly,  Ponce  end  Brady  (7.  4]  adopt  an  approach  very 
similar  to  the  on*  reported  her*  as  they  first  smooth  th* 
lurfec*  with  Gautuan  masks  with  increasing  vsrtanu#  a. 
then  compute  extrema  of  the  principal  curvatures,  and 
finally  group  seta  of  features  to  recognize  model!  such  *c 
rcof  and  stap  surface  discontinuities.  These  operations 
are  vary  similar  to  what  w*  describe;  th*  differences  being 
in  detail*  of  implementation  (unfortunately.  Pone?  and 
Brady's  paper  leaves  out  rrsny  of  :h*  details,  making  a 
dlrsct  companion  difficult) 

2.4.  Our  approach  to  using  curvature  properties 

Our  approach  consists  of  examining  properties  of  sur¬ 
face  curvature  to  infer  significant  physical  properties  of 
th*  turfe-e.  Th*  curvatu  •  properties  that  w*  consider  ar* 
th*  extrema  and  tha  ‘zero-crossings'  cf  th*  surface  Oif- 
votur*.  As  surface  curvature  is  dependent  on  tha  direction 
or  measurement  w*  naea  to  perform  this  computation  *4- 
th*'  In  ‘all*  direction*,  or  choose  specific  orientation*, 
such  a*  those  of  th*  principal  curvatures.  In  this  paper, 
w*  do  th*  former,  in  aaniar  work  [8]  wa  used  the  max¬ 
imum  principal  curvatui*.  which  cauta*  some  oifftcultlea 
a*  explained  In  section  3  1. 

it  c*n  be  shown  mat  these  curvature  properties  ebr- 
respond  to  certain  significant  physical  properties  of  a  sur¬ 
face  [7.  8]  An  occluding  boundary  (sometimes  referred  to 
as  a  ‘jump  bounder/*  in  rang*  data  analysis)  creates  a 
zaro-crossing  of  tha  curvatura  in  a  direction  normal  to 
that  of  tha  boundary.  A  ‘fold  boundary*  (whare  surfaca 
normal*  ar*  discontinuous)  cause*  a  local  extremum  of  the 
curvature  at  that  point.  Fold  boundaries  may  alto  create 
zero-crossings  away  from  th*  location  of  the  boundary  it¬ 
self.  Lastly,  curvature  extrema  correspond  to  certain  dis¬ 
tinguished  points  or  lines  on  smooth  surfaces,  such  as 
along  the  extremes  of  th*  maior  axis  of  th*  cross- sections 
of  an  elliptical  Cylinder 

To  turn  thes*  observation*  into  functioning  algorithms, 
w*  naad  to  SC'v*  tli*  following 

1  Compute  cu'vatur#  properties 

Sine*  w*  work  with  digital  de'.a,  we  can  only  com¬ 
pute  approximations  of  curvttura,  either  by  dif¬ 
ferencing  or  by  fitting  analytical  surfaces  to  given 
data. 


233 


2.  Extract  and  localize  significant  features  of  '‘Jit  cur¬ 
vature  oropertiea: 

Curvature,  being  e  local  measure.  !a  highly  noise 
sensitive.  Larger  masks  to  compute  difference  may 
be  used.  but  they  result  in  a  Ions  of  accuracy  of 
localization.  We  follow  a  scale-space  filtering  ap¬ 
proach  [22]  to  help  solve  these  problems. 

3.  Interpret  sets  of  curvature  features  in  terms  of 
physical  properties  of  surfaces: 

We  wish  to  detect  occluding  boundaries.  surface  dis¬ 
continuities  and  smooth  local  extreme. 


While  these  steps  are  rather  straightforward  in  concept, 
their  implementation  requires  resolution  of  many  detailed 
Issuaa  such  as  how  to  compute  curvature,  which  curvature 
properties  to  use.  and  how  to  combine  information  from 
different  scales  We  next  dtscnba  an  earlier  implemen¬ 
tation  [3]  and  its  deficiencies  and  then  a  new  implemen¬ 
tation  oesignad  to  overcome  these  problems. 

J  Description  et  the  method 


1.1.  Previous  Implementation 

In  our  pravlous  impiamantation  [81  we  work  d 
primarily  with  tha  maximum  principal  curvature,  say  of 
magnitnda  x,  and  direction  8-  We  computed  zero- 
crossings  end  extrema  of  x,  with  the  date  smoothed  by 
Gaussian  masks  of  different  variance  These  features  were 
then  combined  to  give  surface  descriptions 

This  method  worked  well  on  man-,-  examples,  but  ex¬ 
hibited  strange  behavior  in  others  The  principal  cauee  of 
thw  difficulty  is  that  tha  directions  of  tha  two  principal 
curvatures  x,  and  x;  can  twitch  when  the  two  have  needy 
equal  values.  Tills  hsppens  near  a  saddle  surface,  but  alto 
due  to  even  small  perturbations  in  a  needy  flat  surface 

To  overcome  these  problems,  our  new  implementation 
uses  curvature  properties  along  venous  specific  directions 
(though  still  computed  from  the  pdncipei  curvatures  x, 
and  <2)  The  algorithms  for  tracking  across  different 
scales  and  labelling  are  alto  different 


U  Description  et  the  new  approach 

A  biock  diagram  of  our  method  is  given  u  figure  ’ 
Our  bal.c  approach  is  to  first  smooth  tha  image  with  a 
number  of  Gaussian  masks  of  different  variance  and  com¬ 
pute  the  curvature  in  four  aiffe'em  directions.  Wa  then 
detect  extrema  and  zero-crossings  of  tha  resulting  curves 
m  each  direction  and  for  each  smoothed  image  (i  a.  at  dif¬ 
ferent  scales)  After  filtering  out  insignificant  faaturaa. 
features  from  different  scales  are  combined  in  the  step  of 
'space-tracking'  to  give  robust  and  wall-localized  faaturaa 
Wa  thin  combine  adiecent  features  ('space  grouping')  fol¬ 
lowed  by  merging  of  results  from  tha  four  directions  Wa 


now  have  point  faaturaa  of  liter  a  it.  which  are  linked  to 
form  curves  of  interest  Additional  description*  of  these 
curves  finally  allow  us  to  assign  descriptions  in  terms  of 
physical  [-ropertiet  of  tha  surfaces 


Gomwa  S— ochnwg 


Imm  of  Gxvewta 4 Queen— »  I 


(■w—  Itn  Os— n#» 


Li?  I** 

■•M  M 


S<4'l«  (alowf  , 


»  j  Ccwfte* 


S»«4iOl^OtV«  (ft 

_ i _ 

Mwj  "1  >k»  4  P—  no— 


C— <—  *4  W— 


D— xd 


^  I  kkn—ni  h— r  ■ 

Tlgur*  ):  Block  diagram  of  the  overall 
processing 

That*  steps  are  explained  in  detail  below  To  illustrete 
our  steps  we  will  use  the  example  oi  e  'cup*  with  an  ei- 
i.pticpl  cross-section  shown  tr  figure  2.  Tills  de.a  wa* 
obtained  using  an  active  starao  range  finding  system  et 
INRIA  [23 L  courteav’  of  Or  Fabric#  Clara.  (The  elliptical  af¬ 
fect  was  crcat.su  artificially  by  scaling  the  data.)  The 
resolution  is  100  x  80  pixels 

1-21  Gaussian  smoothing 

Computation  of  curvature  is  likely  to  be  highly  noise 
sensitivs  To  decrease  the  effects  of  noise,  we  can  ut*  a 
larger  support  for  computing  differences,  at  a  possible 
coat  of  accuracy  of  locsiization.  Wa  convoly*  the  iisaga 
with  rotatlonally  symmetric  Gaussian  masks  of  different 
variance,  o  The  different  sizes  of  the  fitter  give  us  cu— 
vaturc  at  different  scales,  and  ar*  in  fact  helpful  in  mtor- 
p rating  the  results  (as  in  [24]  and  [22]  for  other 


234 


applications).  Tt.e  width  of  the  mask  it  chosen  such  mat 
the  volume  under  the  truncated  Gaussian  is  vary  ciosa  to 
1  (6a  is  a  gocj  approximation).  in  the  currant  impian^i- 

ration,  we,  us*  O-0.5,  1.0,  1.5. 


Figure  2:  3-0  plot  of  the  'cup"  linage 

12  3.  Derivatives 

Since  we  work  with  discrete  data,  we  compute  dif¬ 
ferences  rather  man  derivatives.  We  compute  the  first- 
order  differences  In  the  nr.iy  directions,  and  the  second 
orrfer  cross  derivative  by  convolving  the  images  with  the 
masks  shown  below 

.1  i>  1  -!  -1  -1  10  -1 

-1  0  1  000  000 

-101  111  -101 

The  output  of  these  masks  are  normalized  by  the  sum  of 
the  weights  in  the  mask.  Second  order  difference  are  ob¬ 
tained  by  convolving  with  these  masks  again. 

Computation  of  Principal  Curvatures 

Ones  these  derivstives  are  estimated,  the  two  principal 
curvatures  are  computed  by  solving  s  second  degree 
aquation,  as  explained  In  !8l  By  convention,  k,  denotes 
the  larger  of  the  two  The  orientation  it  obtained  by  solv¬ 
ing  a  first  degree  differential  aquation  Since  the  principal 
directions  are  orthogonal,  it  la  sufficient  to  represent  the 
angle  e  of  with  the  X-axit  We  are  currently  Inves¬ 
tigating  alternative  ways  to  compute  the  principal  cur¬ 
vatures.  such  as  by  using  the  /scat  modal  (251  Figure  3 
snows  a  "needle  diagram*  computed  from  the  cup  data;  a 
line  is  shown  in  the  direction  of  the  maximal  principal  eur- 
vatura.  the  length  of  the  Una  la  proportional  to  the  mag¬ 
nitude  of  this  curvature. 

12.4.  Gaussian  Curvature 

Some  previous  empirical  observation!  suggest  that  the 
shape  of  the  regiona  of  constant  Gaussian  curvature  sign 
rougnly  reflects  the  overall  shape  of  an  object  [2Sl  We 
compute  these  regions  from  the  unsmoothed  range  image 


(to  preserve  accurate  localization  of  bouiidariesl  To  gat 
rid  of  very  smalt  regions,  wa  apply  a  sequence  of  expand 
and  contract  operations.  At  this  stage,  we  have  our  first 
surfec-  description,  obtained  independently  ce  the  sub¬ 
sequent  processing  applied  to  the  image.  The  cup  figure, 
however,  has  no  such  interesting  regions  as  Gaussian  cur¬ 
vature  is  essentially  zero  everywhere  Knottier  example  it 
shown  in  the  results  section  later!.,  m  figure  4,  zones  of 
zero  gsussian  curvet -re  are  shown  in  grey,  positive  in 
black  and  negative  in  white. 


jjjuim!III|.iii||i:i|"iui,|'l. 

||I.||RP"»I..  'I  3 


M  ■  at 

if.HiHtMi’firifiii'fmiir  e 


Figure  3:  ffeedle  representation  of  the  naif  mm 
prlncfpji  curvature 


Figure  4:  Pegionj  of  constant  C.auss1an  curvature 
:  I  go 


235 


3.2.5.  Curvature  along  a  lino 

Given  the  principal  curvature*  k  *2  and  the  *ngle  *. 
it  it  possible  to  compute  the  curvature  in  any  direction  V': 

K^*«,crs*(p-(i)*ic,sin2<4"<>) 

Curvature  values  along  a  given  direction  give  us  i> 
one-dimen*ional  curve.  In  the  current  implementation,  we 
uss  4  directions  $  *  0*.  45*.  90*  and  135*  It  msy  te 
more  accurate  to  compute  these  quantities  using  different 
masks,  or  directly  from  an  approximation  such  as  the  facet 
model;  we  are  investigating  these  issues  currently. 

Ul  Extreme  and  Zero-croesinge 

For  each  of  rhese  one  dimensional  curve*,  we  compute 
the  raro-crottings  and  th#  iocal  extrema  A  local  ex¬ 
tremum  it  det.ned  at  a  point  whose  absolute  value  strictly 
larger  than  Che  absolute  value  of  one  of  its  two  neighbors 
and  larger  or  equal  to  the  other  one.  Extrema  below  a 
small  threshold  are  discarded  A  zero-crossing  is  given  by 
a  zero  surrounded  by  non-zero  numbers  of  opposite  sign 
on  the  two  sides  or  by  a  sequence  o'  two  n-tmbers  of  op¬ 
posite  sign;  in  the  latter  case,  its  locstlon  it  marked  at  the 
smaller  of  the  two  numbers  No  tnra  j.ioldmq  is  performed 
el  this  stags.  Each  zaro-crotsing  alto  ho*  t  positive  ex¬ 
tremum  and  a  negative  extremum  associated  with  it  on  ei¬ 
ther  tide 

12.7.  Filtering 

At  any  givan  point,  only  information  ui  one  d’rectlon  it 
usafui.  in  ganaral.  If  a  |ump  boundary  occurs  m  the  0* 
direction,  the  descriptions  m  the  other  directions  at  the 
tame  location  should  be  ignored.  A  z*rr»-crosalng  •• 
retained  only  i*  one  of  its  associated  extreme  meets  the 
criterion  that,  at  th*  extremum,  th*  vslu*  of  th*  c»rvatur# 
In  direction*  diffe-ent  from  th*  current  direction  not 
exceed  th*  extremum  value  Note  that  it  >*  still  possible 
for  s  zero— ’tossing  and  axtrams  to  be  associated  with 
more  then  cut  direction  after  this  ttep 

Figure  5  shows  tna  extrema  and  zero-crossings 
detected  from  th*  cup  data  in  th*  X-dir*ction  (horizontal) 
The  top  row  thowa  th*  positive  extrema,  tn#  middle  row 
th*  zero  crossings  and  th*  bottom  row  the  negative  ex¬ 
trema.  The  four  columns  used  smoothing  masks  of 
variance  0.  0  5.  I  0  and  .  5  from  'art  to  right  respectively 
Figure  5  chows  similar  results  for  the  vertical  direction 

12*  Scale  tracking 

Us*  of  multiple  scales  allows  us  to  increase  our  con¬ 
fidence  in  detecting  features  without  lost  of  localization 
By  'hctasing  th*  seal*,  w#  increase  th*  signal-to-noi»* 
ratio  *nd  therefor*  th*  confidence  m  the  feature  that  w# 
extract,  as  th*  cust  of  accuracy  of  localization  Mar*,  w* 
detect  th#  features  m  th#  image  smoothed  with  th*  widest 
filter,  and  localize  them  using  in*  smallest  filter  Since  w* 
tra  using  only  t  discrete  sat  of  filters,  w*  have  to  so>v*  a 
correspondence  problem  between  levels.  At  the  scale  in- 
creates.  we  know  that  new  tenures  may  not  appear,  but 
that  feature*  trom  a  lower  level  mey  marge  Shifts  of  th# 
features  for  different  scales  can  alto  b*  predicted  [71 


In  our  irr.plamantat'cri  we  track  axtrama  and  zero- 
crossings  independently.  The  strategy  we  follow  is 
coazse-io-flna  ont\  features  present  at  the  coarsest 
level  may  be  pan  of  the  description.  (For  soma  applica¬ 
tions.  it  may  be  advantageous  to  have  a  hierarchical 
dascuption.  consisting  of  futures  tracked  at  all  levels, 
than  at  sli  ievaia  except  th*  coarsas’.  and  to  on  )  Th* 
suarch  is  ons-dimensionsl.  and  tn*  amount  of  displace¬ 
ment  of  a  given  feature  depends  on  th*  value  of  o  (2 
pixels  at  most  for  a  3  5  venation  in  o).  Th*  direction  o' 
0‘tpiacement  must  remain  tn*  same  for  different  scales 
for  an  extrema,  but  Is  allowed  to  change  by  one  pixel  for  a 
zero-crossing  (at  its  localization  it  ambiguous  by  one 
pixel). 

When  tracking  an  extremum,  if  w*  come  to  a  'fork', 
mat  it  a  choice  between  two  extrema  at  the  next  finer 
scale,  w*  choose  tn*  oxtremuM  with  the  higher  curvature 
value  if  w*  com*  to  a  fork  for  a  zero -crossing,  we  stop 
th#  tracking  and  timpiy  mark  the  position  at  the  lowest 
unambiguous  level. 

At  mis  point  w*  filter  out  tom*  zero-crossings  that 
tuv#  bean  introduced  (u*  to  the  following  artifact  In  th# 
cup  example,  ideally.  th*  curvature  la  maximum  along 
horizontal  curves  end  taro  along  the  vertical.  However, 
instabilities  occur  near  th*  edge.  Near  th*  edge,  he  sur¬ 
face  Is  receding  rapidly  from  the  viewer  and  hence  th# 
first  derivative  r  it  large  (tends  to  «)  As  curvature  It 
given  by  r/(l»f  J)1/J.  tht  curvature  tends  to  zero.  Thua.  if 
there  i*  a  small  bump'  m  the  surface  near  th*  edge,  th# 
curvature  in  the  verticil  direction  is  likely  to  be  higher 
man  M  m*  horizont/M  direction  giving  undesirable  zaro- 
rrossings.  This  »s  illustrated  m  fig.  5(b)  which  show* 
many  zaro-crossMgs  m  th*  V  direction  clot*  to  th*  right 
border  or  tn#  cun.  To  riandt#  thi*.  we  remove  those  zero- 
crossings  where  th*  derivative  in  the  orthogonal  direction 
is  high  (above  a  threshold) 

Figure  7  shows  the  rasuttt  after  scale  tree  lung.  Th* 
top  and  bottom  rows  show  the  results  in  th*  X  and  Y 
directions  retpective'y  Th#  first  column  show*  th#  posi¬ 
tive  extrema,  th*  second  the  negative  extrema  and  th*  last 
the  zaro-crossmga. 


236 


o 


(4)  positive  extrema 


(b)  zero  crossings 


(c)  negative  extreme 

Figure  5:  Extrema  and  zero»cros$1ngs  In  the  horizontal  direction  for  a  Increasing  from  0  to  1.5 


(a)  positive  extrer.a 


(b)  zero  crossings 


ic )  negative  extrema 

Extrema  and  zero-crossings  in  the  horizontal  direction  for  j  Increasing  from  0  to  1 


(a)  horizontal 


(b)  vertical 

Figure  7:  Result?  of  the  scale  tracking  procedure  from  left  to  right: 
positive  extrema,  negative  extrema,  zero-crossing 


12.1.  Space  grouping 

This  step  is  used  to  find  the  characteristic  groups  or 
features  (extreme  and  zero-crossings)  that  we  call 
junction*.  We  define  5  types  of  junctions 

-  isolated  positive  extremum  (♦) 

-  isolated  negativo  extremum  (-) 

-  linked  positive  extremum  and  zero-crossing  (+0) 

-  linked  negative  extremum  and  zero-crossing  (-P) 

-  linked  positive  extremum  and  zero-crossing  and 
^  negative  extremum  (♦()-) 

These  5  junctions  generate  in  turn  the  following  dic¬ 
tionary  of  8  labels  tor  the  individual  features  of  a  junction: 

'  label  1:  isolated  positive  extremum 

► 

label  2:  isolated  negative  extremum 

label  3:  zero-crossing  linked  to  ?  positive 
extremum  only 


label  F:  zero-crossing  linked  to  a  positive 
snd  a  negative  extremum 

label  6:  isolated  zero-crossing  (to  be  discarded) 

label  8:  positive  extremum  linked  to  a 
zero-crossing  (late)  3  or  5) 

label  9:  negative  extremum  linked  to  a 
zero-crossing  (label  4  or  5) 

The  algorithm  to  compute  these  junctions  Is  applied 
for  each  direction  independently  (and  hence  is  one- 
dimensional).  We  lirst  locate  a  feature  and  then  search  for 
another  feature  along  the  giver  direction  (limited  to  3 
pixels  based  on  analysis  of  expected  locations).  Depend¬ 
ing  on  the  type  of  the  adjacent  features,  labels  are  as¬ 
signed  to  the  junction  (group  of  features).  (For  a  zero¬ 
crossing.  up  to  three  features  may  need  to  be  labelled 
simultaneously  ).  The  grouping  and  labelling  algorithm  is 
given  below. 


laoel  4:  zero-crossing  linked  to  a  negative 
extremum  only 


239 


t/yy 


TtACX(p):  plixl  value  of  lr-put  picture; 

JWCT(p):  pixel  valor  of  output  plcurr; 
n*s  of  point  p; 

saro.plua.alnua:  FUg*  for  x#ro-croaalng,  poaltl** 
and  mgatlva  buIm; 

1.  For  aaeft  nonxaro  point  pi  In  the  input  picture  4o: 

2.  Fire?  rha  flnfe  nonxaro  point  p?  is.  tha  poaltlve  direction 
of  searching  (♦y,  ♦»,  *45,  *135)  and  within  distance  3: 

3.  case  (TRACK  pi)  .TRACK  p2)): 

(pi us, aero) : 

(aero, plus)*  rUg(p1):*flag(p1)  OR  maro  0*  plus; 

flxg(p2):*fUg(p2)  OR  taro  OR  plus; 
break; 

(alnua,xaro):  1 

(uro,alnus):  flag(pl): :flag(p1)0R  aero  OR  minus; 

flag(p2):*flag(p2)0R  aaro  OR  alnus; 
braak; 

4.  and  l; 

5.  For  each  nonxaro  point  p  In  tha  Input  picture  do: 

I.  iaslga  JWCTtp)  According  to  following  table: 


TRACK  flag  JUKT 


pill* 

0  1 

■lnu« 

0  2 

MTO 

MTOvplUk  3 

urn 

xxrosxlmix  4 

wmro 

x*ro*?lux+«imM  5 

MTO 

0  t 

plUfl 

sxro+plux  8 

■lout 

Mrawlnua  9 

L.2.10.  Merging 

The  four  junction  images  are  merged  Into  a  single  im¬ 
age.  Fach  pixel  is  again  labelled  according  to  the  diction- 

ary  given  above.  It  is 

possible  for  a  pixel  to  have  been 

assigned  a  label  in  more  than  one  direction.  If  so.  we 
choose  the  label  with  the  highest  priority  as  defined  by  Its 
position  in  the  following  list: 

5 

zero-crossing  with  ♦  a 

3.4 

zero-crossing  with  ♦  o 

8,? 

*  or  -  with  zero-cross 

1,2 

single  ♦  or  - 

6 

single  zero 

If  a  pixel  is  marked  with  labels  having  the  same  priority 
level,  we  keep  the  one  having  the  largest  magnitude.  Note 
that  due  to  digital  computations,  the  same  feati-e  may 
also  appear  at  adjacent  pixels  in  different  directions.  In 
this  case,  we  used  a  simple  "thinning*  process,  ss 
dsscribed  in  (27], 

Figure  8  shows  the  results  of  space  grouping  on  the 
cup  data.  Fig.  3(a)  shows  the  zero-crossings  (labels  3,  4 
and  5).  fig.  8(b)  shows  the  isolated  negative  extrema  (isbal 
2)  end  fig.  8(r)  the  isolated  positive  extrema  (label  1). 
Figure  o(d)  shows  all  of  these  together.  In  particular,  note 
that  the  edges  in  the  middle  of  fig.  8(b)  correspond  to  the 
distinguished  points  on  the  elliptical  cross -section  of  the 
cup. 

3.2.11.  Spatial  linking 

The  objective  of  this  step  is  to  connect  point  features 
to  form  curves.  First,  we  localize  etch  junction  at  the 
position  of  the  zero-crossing  if  the  junction  has  one  (*0.-0 
or  *0-).  at  the  extremum  location  if  it  is  isolated  (+  or  -). 
Junctions  with  thu  same  label  are  lii.ked  if  their  orientation 


is  compatible  (45°  or  less  aoan),  and  one-pixel  gaps  are 
filled.  If  wo  come  to  a  fork,  we  choose  t  e  longest  branch 
(found  bv  lock  ahead  sea-eh),  and  ganerate  separate  lines 
for  the  smaller  branches.  At  the  e"d  of  this  step,  we  have 
a  non-iconic  description  c'r  tne  surface. 

3.2.12.  Curvature  descriptors 

We  can  now  associate  additional  descriptions  with  the 
detected  curves  as  follows: 

-  ‘sob, te-J  extrema: 

the  value  of  the  maximum  pt.ncipal  curvature  and  its 
orientation  are  sufficient  . 

-  extrema  -  zero-crossing: 

locally,  this  is  represented  by  the  value  of  the  ex¬ 
tremum  of  curvature  and  two  angles  a  andB  as 
shewn  in  figure  9. 

-  extremum  zero-crossing  extremum: 

This  represented  by  the  values  of  the  two  opposite 
sign  extrema  and  the  four  angles  a,.  a2,  8,.  82  as 
shown  in  figure  10. 

An  example  of  the  resulting  description  is  shown  below 
for  the  cup  image: 

Line  #1: 

Starting  point  (21,63);Ending  point  (S4,63) 

Line  length  =  68;  Type  =  3 

Strength  =  7.63*127;  Alpha  1  =  11.222473; 

Alpha2  =  -3.65478*;; 

Betal  =  42.866863;  Beta2  =  42.866863 

Line  #5: 

Starting  point  (6,36);  Ending  point  (6,61) 

Line  length  =  26;  Type  =  5 

Strength  =  20.041059;  Alphal  =  6.844843; 

Alp»»2  =  -1.430612; 

Betal  =  12.753203;  Beta*  =  =1.742495 

3.2.13.  Surface  descriptor* 

Wj  can  now  translate  the  curvature  descriptions  to 
descriptions  of  significant  surface  changes. 

-  jump  boundaries: 

they  correspond  to  the  plus  zoro-minus  above.  The 
same  type  of  descriptions  can  be  derived  (height  of 
the  step  and  orientation  of  tha  two  surfaces). 

-  folds: 

they  correspond  to  the  extremum-zero  above  The** 
need  tc  be  furthor  classified  to  describe  the  2  ad¬ 
joining  surfaces,  and  the  orientation  of  tha  fold  with 
respect  tc  the  viewer. 


4Q 


-  curvature  extreme 


they  art  the  single  extrema.  We  need  to  give  the 
orientation  with  respect  to  the  viewer. 


We  show  result*  on  one  more  exempli  here  (the 
processing  steps  are  the  semi  at  in  the  previous  section). 
Fig.  I1  (a)  shows  the  object  (it  it  a  synthetic  range  image 
of  a  hail-bottle).  Figure  11(b)  shows  the  needle  dlagrem 
and  fig.  11(c)  shows  regions  of  positive  snd  negative  cur¬ 
vature  (lighter  region  shows  negative  Gaussian  curvature 
legions,  darter  regions  show  positive  Gaussian  curvature 
regions)  For  this  xxampie,  the  Gaussian  curvtturea  gives 
a  usaful  crude  segmentation 

Figure  12  shows  the  main  results  of  our  processing 
Figure  12(a)  shews  the  aaro-crossmga.  fig.  12(b)  the  iso¬ 
lated  positive  extrema,  fig  12(c)  the  isolated  negative  ex¬ 
trema  and  fig.  12(d)  shows  them  all  togather 

We  can  draw  some  conclusions  from  these  two  ex¬ 
amples.  First,  significant  occluding  boundaries  and  fold 
boundaries  are  detected  well  (though  jump  boundaries 
would  be  detected  as  wall  or  better  ay  any  normal  edge 
detector  also)  In  addition,  other  significant  curvet  tit 


tlto  detected  on  the  cup.  the  line  in  the  middle  cor¬ 
responds  to  the  spex  of  the  efiipticel  c rota- sections.  In 
the  bottle,  we  defect  the  curvature  extreme  where  the 
bottle  cross-section  is  the  largest  end  elto  where  It  is  .he 
smallest 

We  have  used  these  descriptions  lor  the  ■  rtle  to 
reconstruct  tlv>  original  surface  using  an  impierrw  jtion 
hy  Cochran  (28)  at  USC  that  follows  Tersoooutous  scheme 
[26.  291  at  shown  in  figure  13. 

We  believe  that  these  results  show  the  essential  utility 
snd  feasibility  of  the  proposed  representation  scheme, 
though  many  of  the  details  can  be  improved  we  are  m 
the  process  of  testing  our  matnods  on  a  larger  set  of 
•mages  and  us>ng  the  descriptions  tor  higher  levels  of 
precasting 


241 


1.  Bmford.  TO.  "Visual  Perception  oy  uomputaa.  ±c^.c. 
Conference  on  Systems  and  Controls, 
Dec  ember  1971. 

2.  Nevatla.  R  and  Bmford.  T  0  "Detxrtption  and  Racoff- 
mtion  of  Comoien-Curved  Objects."  Artificial 
Intelllgenc e,  vof  8.  1977.  pa  77-98. 

3.  Manmom.  David  H,  "A  Representation  tor  l.nao# 
Curves."  AAAI-84,  1964.  PC  237-2*2. 

4.  Brady.  M..  Ponca.  J,  Yudie.  A.  and  Read*.  H, 
"Describing  Surfaces"  Proceeding*  of  the  2nd 
International  Symposium  on  Robotics 
Research,  h.  Hsnafusa  and  H.  inooe.  ede,  Massa¬ 
chusetts  inttituta  Technology  Praaa.  Cambridge  “•»» 
1985 


5. 


6. 


7. 


Baal.  PJ  and  Jam.  RC.  "IntnniJc  and  Extnnarc  Surface 
Charactenrtics."  Proceedings  of  the  I  EES 
Computer  Vision  and  Pattern  Recognition 
Confarence.  San  Francisco.  Calif.  Juna  9-13  1985. 
pp.  226-233. 


SethL  IX  and  Javeramemurthy.  SN, 
tiflcation  using  Charan;  » 
Proceedings  of  Inter 
ference  on  Patterr 
1984.  pp  438-440. 


'Surfec a  Claa- 
Contours." 
>  *  Con- 
August 


Ponca.  J..  and  Brady.  M, 
Sketch,'  Proceedings 
national  Conferenc 
Automation,  St  Louia. 
420-425 


Primal 

i_u  nter- 
rohotlcs  and 
.,ar;n  25-28  IMS.  pp. 


8  Madlor.i,  G  and  Nevatia  R  Matching  Imagas  Using 
Unasr  Features."  IEa-  Transactions  On  Pat¬ 
tern  Analysis  And  Machine  Intelligence, 
Vol.  6,  No.  6,  Novamtar  1984,  pp.  875-G8S 


9.  east.  PJ  and  Jain.  R  C..  "Three-Dimensional  0*J|*ct 
Recognition,"  ACM  Computing  Surveys,  Vol.  17. 
No  1.  March  1985.  pp  75-145. 


10  Mllgram,  0.1.  and  8|0'klund.  CM..  "Range  Image 
Processing:  Planar  Suriaca  Extraction." 

International  Joint  Conference  on  Pat¬ 
tern  Recognition- 5,  1980.  pp  912-919. 


11.  Handarson.  TC..  "Efficient  3-0  Object  Representations 
for  industrial  Vision  Systems,"  IEEE  Transactions 
on  Pattern  Analysis  and  Machine 
Tnte lllgence ,  Vol.  5.  No.  6.  November  1983.  pp. 
609-617. 

12.  Bhanu.  B  .  "Surface  Representation  and  Shape  Match¬ 
ing  of  3-0  Objects."  Proceedings  of  the  I'J~E 
Pattern  Recognition  and  Image  Process¬ 
ing  Conference,  Las  Vegas,  Nev,  Juna  14-17 
1982.  pp  349-354. 


13  Hebert.  M.  and  Ponca  J.  "A  Maw  Method  for  Seg¬ 
menting  3-0  Scenes  into  Primitives.' 
International  Joint  Conference  on  Pat¬ 
tern  Precognition,  1982.  pp.  838-838. 

14.  Oshtmj  (4  and  Shea.  y.  "A  Scene  Description 
Method  Using  Three -Oimanswsnal  Information." 
Pattern  Recognition.  1979.  pp  9-17 

15  Inokuchi.  Sai|i.  Nrta.  Tikes  hi.  Matsudew.  Fomto. 
Seducer  Yoshitumi.  "A  Three -Dimensional  Edge- 
Region  Operator  for  Range  Pictures."  Proceedings 
of  International  Joint  Conference  on 
Pattern  Recognltlon-6,  October  1982.  pp. 
918-920. 

18  Mmche.  A.  and  Aggarwal.  JX.  “Detection  of  Edges 
Using  Range  Information.'  IEEE  Transect  ion* 
Pattern  Analysis  and  Machine 
Intelligence.  Vol.  5.  Mo  2.  March  1933.  pp 
174-171 

17  Boils*.  RC,  Horeud.  P  end  Menneh.  MJ„  'A  Three- 
Dmvenuonei  Part  Orientation  System.'  Proceedings 
of  the  8th  International  Joint  Con¬ 
ference  on  Artificial  Intelligence, 
Karlsruhe.  Weal  Germany.  August  8-12  1983.  pp. 
1118-1120. 

11  Ua  C  and  Party.  M.J..  "Shape  Description  using  Sur¬ 
tees  Tnengctertretton."  IEEE  Proceedings 
Workshop  on  Computer  Vision:  Represen¬ 
tation  and  Control,  August  1982.  pp.  38-43. 

19  left  ay.  Thomas  J.  Mecellck.  Robert  M,  Watson.  Layne 

T,  “Topographic  Classification  of  Digital  Image  inten¬ 
sity  Surfaces."  IEEE  Proceedings  Workshop  ok 
Computer  Vision:  Representation  and 

Control,  August  1982.  pp.  171-177. 

20  Lsngrldgs.  0  J.  "Dsisctlon  of  Dlscontinuitiss  in  the 
First  Oarivstivas  of  Surfscas."  Computer  Vision, 
Graphics,  and  Image  Processing ,  Vot.  27, 
Scptamowr  198«.  pp  291-308. 

21.  Nscaman.  I  R,  "Twd-Oimantional  Critical  Po  nt  Con¬ 
figuration  Graphs."  IEEE  Transactions  Pattern 
Analysis  and  Machine  Intelligence,  Vol.  6, 
No  4,  July  1984.  pp  442-449. 

22  Wit'Jn.  A R.  "Scaia-Spaca  Fiitering,"  Proceedings 
of  Seventh  IJCAI,  Karlsruhe.  West  Germany, 
August  IS 83.  pp  1019-1022. 

23.  Boissonnat.  JO  and  Faugaias.  0  0.  Triangjlatlon  of 
3-0  Cbiacts.”  Proceedings  of  the  7th  In¬ 
ternational  Joint  Conference  on  Artifi¬ 
cial  Intelligence.  Vancouver.  BC,  Canada. 
August  24-28  1981.  pp  658-660 

24  Asads.  H  and  Brady.  M.  "The  Curvature  Primal 
Sketch."  Proceed i ngs  of  ihe  2nd  IEEE 
Workshop  on  Computer  '/Ison:  Represen¬ 
tation  and  Control,  Annapolis.  MO,  May  1964, 
pp  8-17. 


243 


25.  Haralick.  R  M .  "Digrtai  Slav  Edoa  from  Zaro-Cro*unga 
of  Saco  xl  CiracttoAal  Oartvattvas.’  IEEE  Trans¬ 
actions  on  Pattern  Analysis  and  Sachins 
Intelligence,  Vo*  6.  No.  1.  January  19M.  pp. 

H-W 

26.  Tarzopoulo*.  0..  tult  Irssolutlon  Computation  * 

of  Visible-Surface  Representations,  PtiO 

di**artation.  Mat*achu*att*  kitotuta  of  Tachnoiogy. 

Oapartmanu  of  Computer  Sciawc*  and  Elactrtcal  En¬ 
gineering.  January  19S*. 

27.  Rotanfafd.  A.  and  Kaa.  A.  Digital  Picture  \ 

Procsssing ,  Academic  Praaa.  Naur  York.  1977. 

21  Cochran.  S.  and  Mediant.  id.  implementation  of  a 
MutttraaoluUon  iulica  Raconwmctton  Algorithm."  to 

appear  in  a  (JSC  intamaf  tachnicai  raport  f 

29.  Tarzopoulo*.  0..  "Computing  Vi  vote  Surfaca 

Representations."  Massachusstts  Inst  1  tuts 

Technology  A I  Lab,  No.  Al  Mamo  800.  1995.  . 


DISPARITY  FUNCTIONALS  AND  STEREO  VISION 


GOT om-o-sc. 


Roger  D.  Eastman  and  AJIen  M.  Waxman* 


Center  for  Automation  Research 
University  of  Maryland 
College  Park,  Maryland  20742 


ABSTRACT 

This  paper  investigates  stereo  matching  constraints 
that  derive  from  an  analytic  model  of  surface  depth. 
Computational  stereo  is  formulated  ns  a  single  stage  pro¬ 
cess  in  which  potential  feature  point  or  contour  raatcLes 
interact  to  provide  support  for  local  estimates  of  a  poly¬ 
nomial  model  of  disparity  (the  dupanty  Junctional],  not 
just  estimates  of  disparity  at  isolated  points.  An  algo¬ 
rithm  is  presented  that  integrates  the  disparity  functional 
with  roultiresolutioa  matching  >f  zero-crossings  to  derive 
depth  to  surface  patches.  The  kjalyticity  of  the  disparity 
field  is  thereby  exploited  early  in  the  matching  process, 
and  yields  surface  reconstruction  as  a  direct  byproduct  of 
correspondence. 


1.  INTRODUCTION 

Recent  computational  approaches  to  the  problem  of 
stereo  correspondence  have  emphasized  the  use  of 
geometric  matching  constraints  derived  from  models  of 
camera  optics,  relative  camera  geomeiry,  scene  depth  and 
photometry.  Smoothness  is  an  important  property  of 
depth  that  can  be  translated  into  a  matching  constraint. 
Since  depth  usually  varies  smoothly  across  surfaces,  the 
disparity  given  by  correct  matenes  should  also  vary 
smoothly  except  at  surface  boundaries  (Marr  and  Poggio 
[»!)- 

In  this  paper,  we  investigate  matching  constraints 
that  derive  from  an  analytic  model  of  surface  depth  com¬ 
bined  with  a  model  of  parallel  stereo  cameras.  Analyticity 
mathematically  formulates  smoothness  by  modeling 
object  surfaces,  and  therefore  the  disparity  field,  as  piece- 
wise  analytic  functions  of  visual  direction.  Our  model  of 
analytic  coherence  mathematically  formulates  the  princi¬ 
ple  of  coherence  stated  by  Prazdny  j|2|.  and  can  describe 
transparent  as  well  as  opaque  surfaces.  In  using  this  pro¬ 
perty.  we  follow  the  work  in  stereo  of  Kornderink  and 
van  Doom  ; tij  and  the  work  in  tu< it ioli  of  Waxman  and 
I  liman  [19],  Wax  in  an  [17],  Waxman  and  Wohn  [20]  and 

•Present  address:  Thinking  Machines  Corporation.  24'» 
First  Street,  Cambridge.  MA  02142  . 


Wohn  (21  ].  Waxman  and  Wohn  [20]  developed  the  Velo¬ 
city  Functional  Method  for  recovering  the  deformation 
parameters  oi'  image  flow  fields,  and  we  propose  here  the 
Dupanty  FunctV  ■'  Method  lor  the  disparity  field.  The 
dupanty  functional  is  a  polynomial  model  of  the  disparity 
field  in  a  neighborhood.  We  show  in  this  paper  that  'he 
locally  linear  disparity  functional  is  a  useful  representa¬ 
tion  of  the  disparity  field  for  performing  both  correspon¬ 
dence  and  surface  reconstruction. 

We  formulate  stereo  as  a  single  stage  process  in 
which  potential  feature  point  and  contour  matches 
interact  to  provide  local  support  for  estimates  of  the 
disparity  functional  coefficients  (and  therefore  local  sur¬ 
face  structure),  not  jua*.  estimates  of  disparity  at  isolated 
points.  This  extends  the  notion  of  local  support  defined 
by  Mazr  and  Poggio  [8|.  Those  matches  that  do  not  parti¬ 
cipate  in  good  estimates  would  fai'  to  win  local  support 
and  be  eliminated,  while  the  computed  coefficients  give 
the  local  variation  of  depth;  no  extensive  surface  recon¬ 
struction  is  needed.  This  formulation  rests  on  the  princi¬ 
ple  that  locally  consistent  image  deformations  are  stiong 
evidence  for  a  locally  smooth  surface.  The  approach 
relates  to  the  Rinocular  Ran,  Primal  Sketch  principle  of 
May  hew  and  Frisby  [0],  which  proposes  that  the  con¬ 
struction  of  extended  image  primitives  (i.e.,  contours) 
should  occur  simultaneously  with  the  construction  of 
disparity  field  primitives  (i.e.,  surface  patchri.)  We  pro¬ 
pose  an  algorithm  that  computes  the  local  disparity  func¬ 
tional  from  contour  matches.  As  with  feature  point 
matches,  those  contours  that  participate  in  good  matches 
can  be  preserved,  those  that  do  not  can  lie  discarded. 

We  present  in  t his  paper  two  algorithms  for  recover¬ 
ing  the  locally  linear  disparity  functional.  One  of  the 
algorithms  integrates  the  linear  disparity  functional  into 
the  Marr-f’oggio-Critnson  matching  algorithm  (Crimson 
]tj)  ic  a  measure  of  local  support;  the  other  uses  the 
linear  functional  as  a  measure  of  contour  correapondei.ee. 
In  Section  2.  we  analyze  relationships  !>etween  models  of 
viewpoint  geometry,  models  of  depth  and  models  of  the 
disparity  field.  In  Section  3,  we  u*e  these  relationships  to 
derive  the  principles  of  the  disparity  functional  method 
and  we  present  preliminary  implementations  of  Un¬ 
met  hod. 


•»  ~  -  U  'W  ’tTr-t  a  i’  ir".  H-P^  n»i  »g-yjnm  wx  saw** 


2.  THE  DISPARITY  FIELD 

In  applying  the  smoothness  property  to  transparent 
surfaces,  Prazdny  j!2j  stated  ‘.he  following  principle  of 
coherence:  “A  discontinuous  disoarity  field  may  be  a 
superposition  of  several  interlaced  continuous  disparity 
fields,  each  corresponding  to  a  piecewise  smooth  surface." 
We  wish  to  state  a  strong  mathematical  formulation  of 
coherence,  that  depth  (and  therefore  disparity)  can  be 
modeled  as  overlapping  analytic  regions  (essentially  piece- 
wise  C4)  with  singularities  at  opaque  occluding  boun¬ 
daries  or  sharp  changes  in  orientation;  a  region  need  not 
determine  actual  scene  depth  for  all  visual  directions  that 
it  subtends,  but  each  visual  direction  *s  associated  with 
only  one  region  and  therefore  one  depth.  This  principle 
of  analytic  coherence  follows  from  ths  principle  of  flow 
analytic  it  y  used  by  Waxman  and  Wohn  [20],  and  estab¬ 
lishes  a  model  for  the  disparity  field. 

The  piecew'se  aualyticity  of  disparity  allows  us  to 
examine  its  structure  in  a  small  neighborhood  cf  an  image 
point  by  Taylor  series  expansion,  and  to  relate  the  terms 
of  the  Taylor  series  to  relative  image  deformation  and 
surface  structure.  By  fitting  a  polynomial  model  to 
disparity  in  a  neighborhood,  we  can  recover  the  Taylor 
series  coefficients  and  thereby  both  image  deformation 
and  surface  depth.  It  is  this  polynomial  model  that  we 
define  as  the  local  disparity  functional.  For  the  sake  of 
simplicity,  we  have  restricted  the  geometric  analysis  in 
this  paper  to  the  case  of  stereo  cameras  with  parallel 
optic  axes  and  image  planes.  This  makes  the  relationships 
between  the  disparity  functional  and  surface  structure 
very  straightforward,  and  also  simplifies  the  search  pro¬ 
cess  since  disparity  becomes  a  scalar  field.  This 
simplification  avoids  the  need  to  recover  generally  unk¬ 
nown  vergence  and  gaze  angles  cf  the  two  cameras  (the 
interpretation  problem.)  In  the  complex  case,  disparity  is 
a  vector  field  and  the  terms  of  the  disparity  functional 
compound  surface  structure  with  the  vergence  and  gaie. 
The  analysis  we  present  here  follows  from  the  analyses  of 
stereo  by  koenderink  and  van  Doom  [6],  May  hew  at.d 
Longuet-lliggins  |I0|  and  th<«e  of  motion  by  Longuet- 
liiggins  ami  Praxdny  [7],  Waxman  and  l.  liman  [iyj  and 
W  ax  in  an  and  Wohn  [2«>|;  see  also  Crimson  [•»]. 

The  disparity  functionrl  models  the  local  structure 
of  the  disparity  field.  The  disparity  field  of  a  natural 
-cue  will  have  a  global  structure  that  reflects  regions 
where  the  unu.l/st  surface  i3  analytic,  intervening 
occluding  tioundaries  and  struct ural  edges,  and  in  general 
the  topology  of  the  visible  scene.  This  must  lie  modeled 
|,v  :i  -egnirnlalion  of  the  disparity  field  into  regions 
where  the  coefficients  of  the  disparity  functional  vary' 
-mo'i  hiv.  a  pre-ess  disrusred  in  the  context  of  motion  in 
Hat’nan  17  and  Waxmaii  and  Wohn  '-**>!-  and  in  the 
-oiitrxt  'T  combined  motion  and  stereo  analysis  in  U  ax- 
man  and  Duncan  |h|.  Transparent  surfaces  complicate 
ttiis  model,  since  a  -ingle  neigh  I  a>r  hood  can  coni  an  i.mlli- 
;.!r  interlaced  disparity  fields.  Kastman  and  Waxinan  ,11) 
discuvws  the  lose  of  multiple  disparity  functionals  in  a 
iieighlh-rluKel  io  nuclei  this  case. 


Fi(.  1  -  Stereo  Camera  Model  with  Coordinate  System* 


Our  objective  in  the  rest  of  this  section  is  to  define 
the  disparity  functional  for  a  small  image  reighborhood, 
and  to  relate  the  terms  of  the  functional  to  local  image 
deformations  and  underlying  surface  structure.  We  adopt 
the  notation  of  upper  case  for  variables  in  world  coordi¬ 
nates  and  lower  case  for  image  coordinates.  We  assume  a 
camera  model  consisting  of  two  pin-hole  cameras  with 
parallel  optic  axes  and  coincident  image  planes.  The 
focal  point  of  the  left  camera  is  located  at  the  origin  of  a 
right  handed  coordinate  system  (X,  Yu  Z,),  while  the  right 
camera  defines  a  similarly  oriented  coordinate  system 
(X„  Y„  Z.)  with  its  origin  at  (-3,0,0)  in  the  left  camera 
system;  the  systems  are  illustrated  in  Figure  1.  This 
gives  a  stereo  baseline  of  B  and  the  systems  are  related  by 
the  equation  (X,  >”„  Z.)  —  (X  +  B,  Yh  Z,).  The  positive  Z- 
axis  for  each  system  is  directed  along  the  line  of  sight. 
The  image  coordinate  systems  (n  a)  and  (r„  y.)  are  nor¬ 
mal  to  their  respective  2-exes  with  their  origins  at  2—  l, 
so  under  perspective  projection  the  images  are  reinverted 
and  scaled  to  a  focal  length  of  unity.  A  single  point  in 
the  world,  (X.  X.  £)•  projects  into  the  left  and  right 
images,  respectively,  at 


(*t-  »)  -  ly-  y) 

•‘i  *>1 


(*-  ».)  —  ( 


X  B 

~zT 


Y 

‘■i 


(la.b) 


This  yiehis  equations  for  horizontal  and  vertical  disparity 
(in  the  left  image  coordinates)  of 

*.(»<■»)  -  *-  ‘i  - r  .  —  f.  -  »i  —  0  (2a,b) 

At  this  point,  the  right  coordinate  systems  are 
-u|ierfluous  since  we  can  write  everything  in  terms  of  the 
left  systems  (or  cyrlopean  systems):  we  In'end  to  <lo  this, 
and  therefore  drop  the  subscripts  r  and  !.  Also,  further 
refer,  nres  to  di-parity  concern  only  l  m  i 

We  now  combine  the  camera  model  with  an  analytic 
model  <.f  depth.  Consider  a  small  neighlorhood  ill  the 
left  image  centered  at  the  origin  If.  yl  *  10.0).  \te  assume 
tint  depth  /(.V  )')  i-  analytic  and  single  valued  along  the  • 

line  of  -ight  in  this  ncighlHirhood.  so  we  ran  expand  it  in  | 

a  Taiy.-r  -erle-  s'-uil  l\  )'|  •  (0  0):  ' 


* 


<f 


1  irnuLL*  j.i  mjjjji  jumm  j  i  wywwwggwww^ 


? 

►/ 

u 


7  .a*  y  <?7 
ar 


r+LJZL 

2  3A': 


1  g3? 

2  ay* 


y*  + 


dxay 


XY  + 


(3) 


To  second  order,  this  defines  a  quadric  surface  patch  with 
two  slopes  and  three  curvatures.  To  abreviate  the  equa¬ 
tions,  we  will  '■•came  the  partials  as  follows:  P  m 


az 

dY 


os 

a3z 

BXdY 


d3Z 

dX1 


£Z_ 

aY 


and 


Since  disparity  is  directly  proportional  to  the 
reciprocal  of  depth  <  m  i/2  (and  depth  is  positive 
definite),  the  local  analyticity  of  depth  implies  the 
analyticity  of  disparity.  However,  we  need  to  convert  (3) 
to  refer  to  reciprocal  depth  as  a  function  of  image  posi¬ 
tion  (x,  »): 


7  -i«'  -*-*> 

-  ("jCrz*2  *  ~jCrrt>  +  Corrrf)  ;  0(x*). 


(4? 


Equation  (4)  is  the  desired  relationship  between  reciprocal 
dep'.h  and  image  postion;  its  derivation  follows  from 
Wax  man  and  Ullman  [l  0]. 

We  can  now  substitute  (4)  into  the  equation  for 
disparity  (2a)  tc  get  a  functional  approximation  for  hor¬ 
izontal  disparity. 


Hr.  y)  -  -£(1  -  Pi  -  Qt ) 

-  Bl  —  Cxri1  *  yCV> Xs  *  'uryxy) 


(■"») 


For  image  neighborhoods  corresponding  to  curved  surface 
patches,  this  second  order  approximation  is  /or ally  valid 
with  an  accuracy  depending  on  the  curvatures  of  the 
patch.  For  the  simple  parallel  camera  model,  the  rota¬ 
tional  terms  of  vergence,  gaze  and  cyclclorsion  do  not 
appear  in  the  disparity  equations.  As  a  result.,  for  neigh¬ 
borhoods  corresponding  to  planar  surface  patches  all 
terms  beyond  first  order  vanish  and  a  linear  functional 
approximation  is  globally  valid  In  this  case,  horizontal 
disparity  can  be  written  as 


<l»i.  a)  -  — U  P* 
ae 


(It) 


1.8) 


If  the  surface  is  planar,  then  reciprocal  depth  and  dispar¬ 
ity  are  'itiear  functions  of  image  c<x>rdi nates.  This  r<  la- 
tion  was  used  hy  Cast  an  and  .Shell  |2|,  who  observed  that 
Z  (depth  in  the  world)  is  then  a  hyperbolic  function  of 
image  coordinates;  it  also  follows  from  the  plan.xr  m<x|ei 
used  hy  Mayhew  and  Uonguet-llir.gins  j  10). 

We  ,-aJI  this  second  order  polynotnial  approximation 
to  disparity  the  disparity  junctional  and  define  its  terms 
as  follows: 


m 


OR1CISAL 


ii 

dl 


m 


3  m 


h1 


4>l 

57 


Fig.  2  -  Deformation*  of  the  Homootal  Disparity  Field  Under 
the  Parallel  Camera  Model 


f(x.  »)—«+»*+  cy  +  kx2  +  -v2  +  fn  1?) 

These  terms  relate  directly  to  the  local  Taylor  series  of 
disparity  (8),  us  well  as  ibe  Taylor  series  of  depth.  These 
reial.ons  are  given  in  (W)  below. 


<(x.  »)  —  lp(00'  + 


oi  I  at 

a *  L  *  df 

l  d’f  I  ,  +  _ 

*~t TloJ»r 


i  a*t  \  , 

I  *  +  ~l  a*:  Ion 
I 


nr  + 


(•) 


a -MOO)  -4,  »-|i  --4-P  (Oa.b) 


a£ 


~-±Q\d-  ^  (9c, d) 


1  iPb 

2  djr 


2  di 


-  -  ~BCrr.  /-  ~ 
2  dioy 


-  -  BC'xr  (®e,f) 


We  caa  restrict  acceptable  values  for  the  disparity  func¬ 
tional  coefficients  by  using  a  disparity  gradient  limit.  If 
we  favor  lower  values  of  VT’TT3  (e.g.,  by  thresholding 
this  magnitude  at  1),  then  we  are  implicitly  favoring  sur¬ 
face  reconstructions  closer  to  the  fronto- parallel  plane. 
This  is  a  relative  and  i«otropic  dispsrity  gradient  con¬ 
straint.  (Koenderink  and  van  Doom  [12|,  Arnold  and  Bin- 
ford  [I].) 

Following  the  work  of  Koenderink  and  van  Doom  |6| 
and  Waxman  and  Ullman  [19),  but  particularly  the 
presentation  in  Wo  fin  [21,23]  for  image  flow  deformations, 
we  can  interpret  the  terms  of  the  disparity  functional  as 
a  local  transformation  between  the  left  and  right  images. 
Essentially,  we  have  rhosen  the  set  {  I,  x,  y,  x*,  y1,  zy  } 
as  a  (non-orlhogonal)  basis  for  the  transformation.  This 
basis  is  illustrated  in  Figure  2:  other  bases  are  described 
in  |2I.23|.  In  modeling  the  more  physiologically  plausible 
camera  geometry  of  cyclotorsion,  Koenderink  and  van 
Doom  |6j  used  an  alternative  basis.  In  this  case,  6.  is 
non-zero,  and  they  defined  the  transformation  in  the 


247 


cyclopean  coordinate  syslt”  located  midway  between  the 
eyes.  Their  basis  characterized  the  local  transformation  to 
first  order  as  a  translation,  rotation  and  deformation 
(compression  ar.d  stretch  along  orthogonal  axes}.  They 
showed  that  the  deformation  componert  is  proportional 
to  the  gradient  of  reciprocal  depth  (i.e.,  a  simple  function 
of  slant  and  tilt)  and  is  invariant  to  changis  in  fixation 
point. 


3.  THE  DISPARITY  FUNCTIONAL  METHOD 

Most  theories  of  stereopsis  divide  the  necessary  com¬ 
putations  into  two  distinct  stages:  establishing  feature 
correspondence  and  reconstructing  surface  structure. 
This  division  has  important  implications  for  matching 
constraints  based  on  models  of  scene  depth.  In  order  to 
apply  these  constraints,  aome  sort  ol  surface  reconstruc¬ 
tion  must  be  pel  formed  during  matching  to  evaluate  how 
closely  the  proposed  matches  fit  the  surface  model.  The 
extent  of  this  "correspondence  reconstruction' '  varies 
from  algorithm  to  algorithm,  but  it  rarely  goes  beyond 
the  heuristic  selection  of  an  optimal  "needle”  or 
“wireframe”  structure  for  raw  disparity  vajues  .it  isolated 
feature  points  (the  raw  2..VD  sketch  of  Marr),  the  full 
reconstruction  phase  mrt  then  interpolate  smooth  sur¬ 
faces  over  the  wireframe  (the  full  2.5-D  sketch  of  Marr. 
cf.  Crimson  |4|  and  Terxopoulos  [  1  S| ).  As  a u  alternative, 
we  present  efforts  towards  a  theory  of  stereo  which 
integrates  the  two  stages  of  correspondence  and  recon¬ 
struction.  The  alternative  is  a  atnglc  itagr.  computation 
which  selects  these  matches  and  disparity  interpretations 
which  directly,  not  indirectly  through  iiejristics.  give  the 
best  locally  smooth  surface  reconstruction. 

Image  feature  points  that  arise  from  texture  mark¬ 
ings  sampie  the  analytic  structure  of  the  unde  rlying  sur¬ 
face.  Given  a  set  of  potential  feature  point  matches  in  a 
neighborhood,  we  can  estimate  the  coefficients  of  the  lex-si 
disparity  functional  by  least  squares.  If  the  set  of  poten¬ 
tial  matches  ia  correct,  the  local  disparity  functional 
should  be  a  good  fit.  If  the  set  contain:,  a  significant 
number  of  incorrect  matches,  the  fit  should  be  ptor.  The 
least  squares  residue!  thus  serves  as  a  measure  of  local 
support. 

If  a  good  fit  cannot  be  found  in  a  neighborhood,  this 
is  evidence  that  the  local  surface  may  be  rough  at  that 
scale,  there  are  occluding  edges  in  the  neighborhood  o* 
(here  are  multiple  transparent  surfaces.  We  list  three 
techniques  for  dealing  with  these  problems.  The  first  is 
to  lit  the  disparity  functional  at  multiple  scales.  The 
functional  serves  to  approximate  the  disparity  field  in  a 
neighborhood;  this  approximation  may  be  valid  at  one 
scale  but  not  at  another.  Ail  example  would  be  a  field  of 
gr:xs.s.  which  is  smooth  at  a  large  scale  but  rough  at  a  tin- 
scale  (the  stereo  pair  in  Figure  ;>  exhibits  this  property). 
This  shows  .fiat  image  deformations  take  plar.  at  various 
scales  and  may  need  to  be  .similarly  recovered.  The 
second  is  to  lit  the  disparity  functional  to  multiple  over¬ 
lapping  neighhorh.xnls.  and  use  a  modified  split -merge 


approach  to  locating  disparity  discontinuities  and  improv¬ 
ing  the  hi.  The  modification  comes  from  using  overiap- 
ping  neighborhoods,  so  that  the  surfaces  fit  to  adjacent 
neighborhoods  can  be  compared  in  the  region  of  overlap 
(cp.  overlap  compatibility  in  Wsrrnau  and  Wohn  120], 
Waxman  and  Duncan  ilsj).  The  third  is  to  fit  multiple 
disparity  functionals  to  a  single  neighborhood.  .Vs 
Prazdny  [12]  noted,  transparent  (or.  more  properly,  inter¬ 
mittent)  surfaces  can  result  in  interlaced  disparity  fields: 
an  example  would  be  a  chain-link  fence  a  shori  distance 
in  front  of  a  brick  wall.  Disentangling  the  surfaces 
requires  identifying  the  two  planar  disparity  functionals 
that  fit  subsets  of  the  feature  points. 

(  sing  the  disparity  functional  residual  as  a  measure 
of  local  support  rests  on  the  principle  that  locally  coti- 
sistent  image  deformations  are  strong  evidence  for  a 
locally  smooln  surface.  Other  stereo  matching  algorithms 
that  have  emphasized  neighborhood  deformations  include 
those  of  Koenderink  and  van  Doom  [6],  Praidny  |I2|. 
Pollard.  May  hew  and  Frisby  [  1 1  ]  and  Quam  [13],  The 
algorithm  of  Koenderink  and  van  Doom  ]6|  extracts  the 
deformation  component  of  the  disparity  field  by  a  com¬ 
parison  of  local  texturr  measures  in  the  two  images;  this 
suggests  an  attractive  though  unproven  approach  to  com¬ 
puting  the  local  disparity  functional  without  establishing 
feature  point  rurresponden  es.  like  the  dtffrcqtteneg  alerro 
of  Tyler  and  Sutton  [ EC]  or  the  approach  to  motion  of 
Kanatc.ni  [5|.  The  algorithms  of  Prazdny  [12]  and  Pol¬ 
lard,  Mayhew  and  Frisby  [II]  select,  by  a  pairwise  voting 
.scheme  between  i>at  jre  point  matches,  the  disparity  at  a 
point  that  that  minimizes  '.be  S:st  order  deformation  >n 
the  surrounding  neighborhood,  .’Vs  measure  of  local  sup¬ 
port  avoids  e-rplictly  calcuKt.tig  the  l<x-al  deformation. 
The  algorithm  of  Quam  :  uses  &  model  of  depth  for 

uarptnq,  or  d-fo.-nirg,  a  iirighborhtxvd  for  intensity 
correlation;  however,  tue  mtxlrl  is  expressed  in  world,  not 
image  c  virdi nates. 

We  have  implemented  two  algorithms  which  demon¬ 
strate  the  use  of  the  linear  disparity  functional  for 
correspondence  and  reconstruction.  The  first  algorithm 
uses  contours  as  the  matching  primitives.  A  single  match 
of  two  contours,  at  let-sl  one  non-linear,  provides  enough 
structure  to  compute  a  disparity  functional.  We  select 
t  ho*’  contour  matches  that  yield  low  residual  functionals 
and  group  them  to  expand  the  scope  of  the 
approximation.  This  is  similar  to  the  measures  of  con¬ 
tour  similarity  devised  by  S u k  and  Kang  [  1  1 1 .  and  to  the 
algorithm  disrus.  ed  in  Viiibe  and  I’oggio  24 1 .  The 
second  algorithm  uvs  in  hvrtual  zero-crossings  as  the 
primitives.  The  left  ima,  s  divided  into  neighborhoods, 
and  each  iieighborhoiel  utt-i’rgoes  a  search  for  the  dispar¬ 
ity  function -I  that  Is- 1  fits  a  subset  of  the  local  zero 
crossing  mairtiev  i  ’.,i ;  algorithm  is  a  variation  on  the 
multiresol.t',,,!,  z  •o-rr'csiiig  matching  algorithms  of 
Marr-I’oggio-Gri  ;  vji  (Crimson  ]  I ] );  the  tnajo*  difference 
is  its  use  of  the  di-parity  functional  to  vompu1  •  both  the 
local  structure  of  depth  and  a  measure  of  local  support. 
The  algorithm  is  al-o  similar  to  the  inultiresolntion  corre¬ 
lation  algorithm  of  Quam  [l3|. 


Both  algorithms  ust  a  least  squares  procedure  to  fit 
either  a  first  order  or  second  order  polynomial  6  =  %{z,  y) 
lo  a  set  of  n  points  {(x„  y,,  4,1},  where  (x,,  jr,}  are  in  left 
image  coordinates  and  4,  =  z,  -  z,.  The  average  residuri 
error  is  calculated  as 

^  =  —  v:  I  y.) !  'H) 

”  i=i 

3.1  Contour-baaed  algorithm 

Contours  are  defined  as  weakly  monotcoically 
decreasing  eight-connected  chains  of  edge  points.  This 
means  that  extended  image  contours  are  broken  at  verti¬ 
cal  minima  and  maxima,  and  each  contour  intersects  a 
horizontal  scan  line  only  once.  After  this  processing,  each 
image  Is  represented  by  sets  of  contours  which  are  in  turn 
represented  as  sequences  of  points.  For  each  contour  in 
the  left  image,  the  matching  algorithm  attempts  to  find  a 
contour  in  the  right  image  that  oest  satisfies  i  measure  of 
similarity  based  on  the  linear  disparity  functional. 

STEP  I  Contours  as  defined  above,  with  a  minimum 
h-ngth  of  H  pixels,  are  extracted  from  the  ,eft  and  right 
images.  Horizontal  runs  of  edge  points  are  approximated 
by  the  pixel  closest  to  their  midpoint. 

STEP  ?  if  a  pair  of  contours  share  at  leut  i  rows,  and 
disparity  is  positive  along  the  contours,  it  is  marked  as  a 
possible  match. 

STEP  3  For  each  possible  match  of  these  contours,  a 
linear  disparity  functional  is  fit  by  least  squares  and  the 
residual  computed. 

STEP  4  If  the  linear  coefficients  ;ield  a  disparity 
gradient  above  l,  the  match  is  rejected. 

STEP  .5  For  each  contour  in  the  left  i  itagc,  the  possible 
matches  are  ranked  by  the  magailuce  of  the  residual. 
'  tches  with  a  residual  more  than  twice  the  minimum 
ar.  ,‘jecieil.  If  all  matches  but  the  n  inimum  heve  barn 
eliminated,  it  is  accrpted. 

STEP  6  If  a  right  rontour  now  participates  in  an 
accepted  match,  the  other  ambiguoui  matches  it  partici¬ 
pates  in  are  rejected.  This  may  leav?  a  left  contour  with 
a  unique  match,  which  is  accepted,  fvemaining  ambiguous 
matches  are  eliminated  by  accepting  the  minimum  resi¬ 
dual  match  for  each  left  contour,  and  then  for  each  right 
contour. 

STEP  7  If  two  connected  roatours  in  the  left  image 
march  two  connected  contours  in  the  right,  join  them  and 
recompute  the  disparity  functional.  If  the  new  residual  is 
in  -eh  larger,  reject  the  join. 

We  illustrate  the  results  of  this  algorithm  w.th  two 
synthetic  ster'-o  pairs.  The  images  are  binary,  IVO x 
pixels  In  resolution  and  were  generated  with  the  projec¬ 
tion  equations  (la}  and  (li>)  wl'h  a  stereo  baseline  II  of  3 
and  a  field  of  view  of  20  degrees.  The  first  stereo  pair 
(Figure  3a)  views  a  planar  .surface  Z  100  -  A"  -  }',  the 
second  (Figure  la)  an  elliptic  paraboloid 
7.  100  •  ,V  •  V  *  0  [J 1 7 'i .V'  ♦  0  1)175  K2.  The  contours  were 
generated  bv  orthographim  ly  projecting  ellipses  onto  the 
surfaces.  There  are  about  33  contours  in  each  image. 


249 


o 


a)  M*r*o  pur 


Fit  1  -  Contour*  oc.  *  Plu» 


*1  Mereo  pur 


fig  4  -  t'an'-ours  on  ui  hlliptic  Pnrnboloid 


rt--.: 


B 

C 

I 

I 


E 


r 


i 


When  run  on  these  images,  the  contour  algorithm  was 
successful  in  matching  ail  contours  except  ior  a  few  that 
subtend  small  fields  of  view.  This  is  the  major  drawback 
of  this  approach,  for  if  the  image  texture  is  loo  fine  to 
yield  contours  of  adequate  field  of  view,  then  the 
r.  covered  disparity  gradients  will  not  be  reliable,  in  tnis 
case  contours  need  to  be  grouped  together  until  the  fieid 
of  view  is  adequate.  This  is  implicit  in  the  neighborhood 
algorithm  presented  next,  f  igures  3b  and  sb  show,  in 
world  coordinates,  the  original  surfaces,  while  Figures  3c 
and  do  show  the  recovered  surfaces.  A  second  order 
disparity  functional  was  used  during  surface  reconstruc¬ 
tion  in  Figure  do. 


3.2  Neighborhood-baaed  algorithm 

The  neighborhood  algorithm  avoids  the  need  to 
searrh  for  contour  groupings  by  fitting  the  disparity  func¬ 
tional  to  a  set  of  neigh  borh<xxis  in  the  left  image,  kaeh 
neighborhood  searches  for  the  disparity  functional  that 
best  fits  a  subset  of  the  local  zero  crossing  matches  at  two 
scab's.  The  search  process  for  the  best  fit  is  brute  force:  a 
neighborhixid  ip  the  let  image  is  positioned  over  the 
right  image  at  each  possible  disparity  offset,  and  those 
matches  which  are  closest  to  this  constant  disparity,  and 
fall  within  a  matching  window,  are  accepted.  These 
matches  a-e  fit  with  a  disparity  functional,  and  the  aver¬ 
age  residuals  computet).  The  process  of  setting  matches 
and  computing  the  functional  Is  then  iterated;  outlying 
matches  which  differ  significantly  from  the  functional 
approximation  are  reject'd,  new  matches  which  now  fall 
within  the  window  are  accepted,  and  the  functional  is 
recomputed  on  the  new  set.  This  is  first  done  at  the  low 
resolution  ai.d  then  repeated  at  the  high  resolution,  with 
a  significant  difference;  at  each  disparity  offset,  the  high 
resolution  begins  with  the  last  disparity  functional  com¬ 
puted  at  the  low  resolution,  ’’’he  disparity  offset  with  the 
lowest  high  resolution  residual  is  accepted  as  correct  and 
(he  disparity  functional  eomputrd  at  that  offset  is  used  in 
reconstruction.  The  following  steps  are  performed  for 
earh  neighborhood. 

STKI'  1  The  low  resolution  left  and  right  images  are 
scanned  to  compile  a  list  of  potential  matches  for  eacli 
edge  point  in  the  left  image.  The  re  t rietions  are  positive 
disparity  (i.e..  match  lies  to  the  left)  ami  same  zero  cross¬ 
ing  sign.  The  same  is  done  at  me  high  resolution, 
s' 77:7'  3  The  inlial  disparity  offset  is  initialized  to  0.  the 
final  to  the  value  that  would  leave  the  neighborhood  .*¥''■ 
off  I  lie  ..-ft  side  of  the  right  image.  The  next  steps  ire 
performed  for  e.arh  offset  h-tween  tin-  intiai  an. I  final 
values. 

<7/7 >  .'a  For  each  point  in  the  low  resolution  left 
image,  the  disparity  closest  to  ( In-  disparity  offset  is 
selected  unless  it  lies  outside  of  a  window  centered  it  i  lie 
disparity  ollset.  The  size  of  the  window  was  ±  II .  where 
It  was  t  he  width  of  the  inner  positive  region  of  the  zero- 
r rossing  operator.  A  linear  disparity  functional  i'  Id  "> 
the  matches,  and  the  residual  /.  computed. 


STEP  2b  Step  -2a  is  iterated  at  low  resolution,  with 
one  difference.  The  matches  selected  are  now  those  within 
3 E  of  the  disparity  predicted  by  the  disparity  functional 
computed  at  the  last  resolution.  The  iteration  proceeds 
until  one  of  four  conditions  is  satisfied:  more  than  half 
the  points  have  no  match,  t he  residual  stops  changing. 

the  disparity  gradient  of  the  functional  exceeds  1,  or  an 
arbitrary  limit  on  the  number  of  iterations  is  reached. 

STEP  2c  Step  2a  is  now  applied  to  the  high  resolu¬ 
tion  images  with  the  disparity  offset  replaced  by  the  low 
resolution  disparity  functional.  The  size  of  the  search 
window  is  again  ±  W,  where  W  was  the  width  of  the  inner 
positive  region  of  the  high  resolution  zero-crossing  opera¬ 
tor. 

STEP  !d  Step  2b  U  applied  to  the  high  resolution 
images. 

STEP  S:  The  disparity  ollset  with  the  minimum  high 
resolution  disparity  functional  is  accepted  (provided  that 
at  least  60 ‘7  of  the  points  had  matches.)  Both  the  high 
resolution  disparity  functional  computed  at  this  offset, 
and  the  matches  that  agree,  are  accepted. 

This  algorithm  has  been  applied  to  the  three  syn¬ 
thetic  stereo  pairs  in  Figures  5,  6  and  7.  The  images  are 
320  x  330  pixels  in  resolution  and  were  generated  with  the 
projection  equations  (la)  and  (lb)  with  a  stereo  baseline  B 
of  5  and  a  field  of  view  of  25  degrees.  The  first  stereo 
pair  (F  igure  5a)  views  a  planar  surface  2  —  100  +  X  *  Y. 
the  second  (Figure  6a)  an  elliptic  paraboloid 
2  -  100  ♦  OOSAf  ♦  0  0SF\  the  third  (Figure  7a)  a  hyper¬ 
bolic  paraboloid  2  -  100  +  OOS.V*  -  OOSV*.  The  same  ran¬ 
dom  dot  texture  was  oi  thographically  projected  onto  each 
surface.  Zero- crossings  of  a  operator  were  found  at 
two  scales  with  inner  windows  of  18  and  0  pixels,  respec¬ 
tively.  The  zero-cronsing  contours  shown  are  not  closed 
because  horizontal  runs  were  approximated  by  the  pixel 
closest  to  the  midpoint.  Two  neighborhood  sizes  were 
used:  5ax#«  (jXS  degrees)  and  32x32  (2  Sx2S  degrees). 
Each  neighlxirhood  size  was  considered  at  two  resolu¬ 
tions.  Only  the  ZMxJM  central  areas  of  the  left  images 
were  matched.  The  match  windows  used  i"  steps  2a  and 
2c  were  tin  and  ±8  pixels  respective !y,  based  on  the  Marr 
and  I’oggio  [8|  analysis  of  zero-crossing  densities.  (This 
implies  that  the  maximum  disparity  mange  across  the 
a«x««  window  in  Mep  2a  is  36  pixels,  giving  a  maximum 
disparity  gradient  less  than  1.)  A  limit  of  three  iterations 
was  set  for  combined  totals  of  steps  2a  and  2b,  and  2c 
and  2d. 

The  original  and  rerovered  su-faccs  are  displayed  in 
Figures  6d-g.  fhl-ir,  and  7d-g.  If  we  r^n.uder  as  correct 
lhi.se  zeneeprs-ing  matches  that  are  in  e|ose  agreement 
with  eurreet  disparity  functionals.  about  !M)(i  of  the 
zep e cr< e-sings  In  each  image  wire  matched  i). -spite  'i:** 
■mall  seale  roughne-.  <f  the  images,  t|,e  snrtV-s  were 
reel  vered  well  ill  ill  but  1  few  .12  x  .12  l."i  g  |)  In  Ill  *  » I'.  Tile 
correel  disparity  offs.-’  was  found  for  these  eases,  but  tie- 
di-parity  funetioiiai  did  m  t  aj.provlm s'  'le-  -u-face  well. 
This  -eellis  due  to  (V  libels,  bol  ll  tfU'eil.le  !i  '  .lie 


•  .> 

i- 


* 


r 

s- 


<r 


2  -,f) 


cause.  The  failures  occured  on  the  left  side  ol  the  images 
of  planar  and  elliptical  surfaces.  The  parr  of  the  surface 
v'slble  at  the  left  edge  of  the  left  image  is  cut  off  on  the 
right.  This  distorts  the  zero-crossings  slightly  and  eiim- 
j  ,  inat^s  some  matches  so  the  horizontal  field  of  view  is 
shortened.  The  problem  may  also  be  due  to  the  steeper 
slope  on  the  side  of  the  elliptic  surface,  since  the  search 
strategy  clearly  favors  low  values  for  the  disparity  gra¬ 
dient.  There  were  two  substantial  failures  out  of  196 
.  32X32  neighborhoods. 

i  * 

|  The  graphs  in  Figures  8  and  0  plot  the  percentage  of 

zero-crossings  uniquely  matched  and  the  residual,  respec- 
.  tively,  as  a  function  of  disparity  offset.  The  residua]  is 
given  in  units  of  pixels.  (Uniquely  matched  means  there 
,  was  only  one  possible  zero- crossing  match  in  the  search 
window.)  There  are  six  graphs  in  Figures  8  and  0,  one  for 
[  each  of  the  iterations  at  low  and  high  resolution.  They 
1  are  for  the  upper  right  84 xM  neighborhood  in  the  planar 
stereo  pair  of  Figure  5  The  parameters  used  in  generat¬ 
ing  the  surface  were  Zo  -»  100,  P  —  1.3  and  <3  1  0,  which 

give  theoretical  disparity  functional  coefficients  of 
«  —  0.050,  4  —  -0  050,  r  =■> -0  050,  The  disparity  offset 

selected  by  the  algorithm  for  this  neighborhood  was  -22, 
I  the  coefficients  of  the  recovered  disparity  functional  were 
|  •  =  0  0501,  4  —  -0  0484,  t  —  -0  047Q,  the  residual  0.42  pixels 

and  the  recovered  depth  parameters  Zo  —  88  7,  P  ■*-  o  87 
and  0  —  0  85.  There  were  282  and  511  zero-crossings  in 
this  neighborhood  at  the  low  and  high  resolutions,  respec¬ 
tively,  of  w’  :ch  275  and  460  were  given  matches.  The 
plot  of  match  percentage  for  the  first  low  resolution  itera- 
|  tion  (Figure  8a)  shows  a  variation  on  the  local  support 
measure  of  Marr  and  Pofgio  [8j;  probabilistic  arguments 
based  on  the  density  of  zero-crossings  justify  rejecting  a 
disparity  offset  if  less  than  TO^c  of  the  zero-crossings  in  a 
neighborhood  are  uniquely  matched,  i.e.,  only  one  poten¬ 
tial  match  appeared  within  a  search  window  on  the  scale 

(of  the  zero-crossing  operator.  Figure  Ha  clearly  shows  the 
Marr-l’oggio  constraint,  since  close  to  the  rc-rect  dispar¬ 
ity  offset  the  match  percentage  is  SO'Ti  but  elsewhere  it 
generally  stays  below  ICf'r.  The  graphs  of  the  residual  in 
Figure  9  show  a  distinctive  minimum  in  the  area  of  the 
correct  disparity  offset.  The  residual  at  the  third  high 
resolution  Uei.-tion  (Figure  9f)  is  the  measure  used  by  the 
algorithm  to  select  the  minimum  residual  for  a  disparity 
offset,  but  the  other  five  iterations  show  a  minimum  in 
the  same  location.  Actually,  iterr.iing  at  the  high  resolu¬ 
tion  had  little  effert  on  the  residual.  Away  from  the 
correct  offset,  the  residual  hovered  around  the  spare  ron- 
*  slant  n  of  the  zero-crossing  operator. 

The  algorithms  and  experiments  presented  here  have 
several  shortcomings,  amt  mainly  serve  to  i. I ust rate  l tie 
disparity  functional  residual  as  a  measure  of  local  sup¬ 
port.  The  neighborhood  algorithm  is  slow  (on  the  order 
if  two  ITT'  hours  when  implemented  in  C  on  DM'  VAX 
ll/Ts.a).  But,  this  time  resulted  from  running  all  six 
iterations  at  every  disparity  offset  to  collect  complete 
data  for  illustrative  pti'l-oses.  It  is  unnecessary  to-, ampin 


a)  sUrto  pair 


b)  sUrro  pa/r  erf  low  rfaoJ<jt»on  iero  cr«inp 


c)  Hcreo  pair  of  r*toluiirn  tero  crowung* 


251 


11 


disparity  offsets  closer  than  about  half  the  mask  sue  used 
in  the  contour  extraction.  In  addition,  the  iterations  at 
high  resolution  are  unnecessary.  We  expect  a  reduction  in 
computation  by  a  factor  of  ten.  Also,  different  neighbor¬ 
hoods  can  be  piocessed  simultaneously  nr.  a  parallel 
machine.  In  future  work,  we  expect  to  refine  the  algo¬ 
rithms  and  subject  them  to  more  exhaustive  testing  on 
natural  scenes.  The  major  issues  to  be  investigated  are 
how  best  to  recover  the  deformation  parameters  in  more 
complex  scenes  with  transparency  and  occlusions,  and 
how  to  segment  the  resulting  disparity  field.  These  issues 
are  discussed  in  greater  detail  in  Eastman  and  Wax  man 
13  j. 

REFERENCES 

1  RD  Arnold  and  TO  Brnford,  “Geometm  Ccfutrunu  in 
Stereo  Vision"  ,  .°r  acceding*  SPIE,  Vol.  238,  pp  2*1  292.  San 
Diego,  CA.  1980 

2  S  Caul  an  and  J  She»i.  ’A  Stereo  Vision  Algorithm  inking  into 
account  ihe  pei..iv»fiive  du  torttonr  Proceedings,  Seuentk 
International  Conference  on  Pattern  keeofnition,  pp  441-4  tt. 
19X4 

3  R  0  Eastman  und  A.  M  W'axn:an.  “Using  Disparity  Function¬ 
als  for  Stereo  Correspondence  anci  Surface  Reconstruction 
University  of  Maryland.  Center  fo»  Automation  Research 
Technical  Report  145,  October.  1985 

4  WKt  Gniwon,  From  1  range*  to  Surf  nee*  (Cambridge 

MIT  Prw»).  1QH1 

5  K  Kan  a  lam,  ‘Structure  from  Me  lion  withou'  Correa  pendente 
General  Principle".  r*ro<  eedtngr .  V»  nth  International  Joint 
('on ftrence  on  Artifical  InitUiftnc*,  po  886-K88,  1985 

8  J  J  Korndennk  and  A  J  van  bo^rn,  “Geometry  of  Lmorular 
V*sicn  and  a  Model  for  Slereopsia",  lUoimgical  Cybernetics,  Vol 
21,  pp  29-35.  1976 

7  IIC  Long'ieuHiggma  and  K  Pracdny.  “The  Interpretation  of  a 
Moving  Retinal  Image".  Proceedings  Royal  Sot tely  London. 
Vol  H  208.  pp  3HV3S7.  19H0 

*  [)  Mart  and  T  Poggio  “A  Theory  of  Human  Stereo  Vision” 

Prom  dings  Royal  >*cicty  London  V' 1  H  204,  pp  301-32*. 
1979 

9  I  K  W  May  hew  and  JP  v’n*by,  '‘Computational  and  l’sydi^ 
logical  Studies  Toward*  a  Theory  of  Human  Slerropsis 
Aritfica*  Intelligence.  Vol  17.  pp  349-3X5.  i  OH  I 

10  JEW  Mayhew  a..d  If  C  LongupuHiggin*.  A  Computational 
\ i<*de!  of  ihnoruUr  Depth  Perception",  Xaturr.  Vol  297.  pp 


B  Pollard,  J  E  W.  Mayhew  and  JP,  Fnsby,  “Disparity  Gra¬ 
dient*  and  Sterro  Correspondences’",  Technical  Report. 
Sherfieid  University,  1985. 

12'  K  Praidny,  “Detection  of  Hinoc-'-ar  Disparities’"  .  EnUogieaJ 
Cybernetics,  Vol.  52,  pp.  93-99.  1985. 

13  L  Quam.  Hierarchical  Warp  Stereo"  ,  Proceedings,  Image 
L  hdersUndmg  'workshop,  pp  149- KjD.  1984. 

14  M  Suk  and  H  Kang.  “Ne**  Measures  of  Similarity  between 
Two  Contours  Based  on  Opt.mal  Bivinate  Transforms  ",  i'  >m- 
pntcr  rm;R,  £>apAic«Qiid  Image  Processing,  Vol.  26,  pp  168- 
Ji2,  1984. 

;l5j  D.  Tenopoulrs,  “Multiresoiution  Co.jiputrlior  su  Processes  for 
Visual  Surface  R. -construction"  ,  Comparer  \  u%on,  Graphics 
and  image  Process mj,  Vol  24,  pp.  52-96,  1983. 

;16j  C  W  Tyler  end  E  E.  Sutton,  “Depth  from  Spstial  Frequency 
Difference  an  Old  Kind  of  Sterecgsis?”,  Puie*  Research,  Vol. 
19  pp  859-865.  1979. 

.17]  AM  W  ax  man.  “.An  Image  Flow  Paradigm"  Proceedings  2nd 
IEEE  Workshop  on  Computer  Vision  Representait'n  and  Con - 
i;--/  Annapolis.  MD.  1984. 

18  A  M  Wuniis  and  J  H.  Duncan,  ‘  D.nocular  Image  Flows. 
Sups  toward  Stereo-' Motion  Kiwon”.  Unive.sity  of  Maryland, 
Center  tor  Automat  on  Re*«i-ch  Technical  Report  119,  May, 
1985 

19,  AM  Wax  man  and  S  l  liman.  “Surface  St«w:*ure  and  3-D 
Motion  From  Image  How:  A  Kinematic  Analysis",  tnivmii) 
o*  Maryland,  Center  for  Automation  Research  Technical  Report 
24.  October  If83 

20  A  M  Wax  min  and  K  Wohn,  “Contour  Evolution,  Neighbo> 
hood  Deformation  and  Global  Image  Flow:  Planar  Surfaces  in 
Motion  ’,  University  of  Maryland,  Center  fo.*  Automation 
Research  Tech  Report  58,  April,  1984  Also  see  International 
Journal  of  Robotics  Research,  Vol  4,  1985 

21]  K  Wohn.  “A  Contour-Baaed  Approach  to  Ina^*  Flew”,  Ph  D. 
Dissertation,  Department  of  Computer  Science,  University  of 
Maryland,  1984 

22;  K  Wohn  and  A  M  Waxma.1.  "Contour  Evc.ution,  Neighbor¬ 
hood  Deformation  and  Local  Iviage  Flow  Curved  Surfaces  in 
Morion",  Cnivrrsav  of  Maryland.  Center  for  Automation 
Revar.'h  Techn'aJ  lb;>ort  134.  July.  1985 

23  K  Wohn  and  A  M  W' ax  man.  “The  Analytic  Structure  of 
Image  Hows  Deformation  and  Segmentation'',  University  of 
Maryland,  Center  for  Automation  Research  Technical  Report, 
(in  preparation) 

24  A  l.  3  M  *  .ud  T  Pnggio.  ".*»  Genrrahied  Ordering  Constraint 
for  Stefs o  Correspondence".  MIT  A  1  Memo  777,  Stay.  1984 


’":  v  •••  '  ;  .*• 


rvf*  r^Slr j/.®'<;  .<  ^jf  '. ' ' 

,  ■  '.'  J-  -Vt\-  -»-T  'V-  _' 

&&&&&$$&& 


j.,  >'  j?i  v  vr 

i 


a)  stereo  pair 


'  V  {. ■^*i‘ ■“ *<A  v  v.-.;  ••’  .^** 

V'"^  v'V*  'fv-r''(  ►-  V.  -  >;>•;  >  r  .  / 


a)  stereo  pair 


* ,  ,* 


Mte;*' 

i»fX? 


v  '^yfi  '//  '; j\  i 

'')>V^'  'vj'WtyA 


■XvVAvv 

^vV  Vh  tv/ 


b)  s'ereo  pair  of  low  resolution  lero  crossings 


Mum 


g$n$Sfc 


man 


c)  »tcr<A.  pur  of  high  resolution  «tno  croMir.js 


vr~?>'N 

/Avv  ■', 


£%**$&. A 

V  NVy  y 


telitf  'L 

j  I  ••  J  »  .L:~y 

&1&J.  fc;:#' 


’  V ' ’’>  !  V.  1  V;  -  •  .'. o •! 

►  • «  ' » *  i  .  ■<  .  *  . 1  -  •  /  ;  ^ 

•  .  v <-• ,  •  s\  , :;  ■ 

i'V,  0  'A\  ’)  /■  '.  /<! 

\T  ,  >:■•' 

r^vi  y-.-w.i"’- 

!;■ 

*v  .-'•■••  3  *  •  v 
t-V.  iLdLi v  ii>J 


mwm 


•  *,,<  '  TV.  ({)  / 

/,  /.  v  V'^  J  Hu  l\'H/  / 

A  •  V  >  o  >i  y"  ')  /;  / 

;‘ VV  V,'W''V  ( 
•V'V-  A  .  AT  >  if  \ 
w^T//.,0/  ',\;  T,T 

VW  -V  V  -’  -''''.  - 1 
A.V  '0^1 


b)  ilrrto  pair  of  low  resolution  *ero  croesinga 


c)  »ter^>  pair  of  high  resolution  lero  crowia.~ 


M-\ 

a/<\.-/'v  r./w, 

!>m0? 

Vs?lL 


I  li  •*»!/  4«plft 


•7?'s^r 


Am'\.  s? 


.•• o 

y.-.w 


i,  r^--5*«rvi  *fiw  ■  n  « .<a  ■  n  •'**  •  m  .  84 


t  ■  ■»» ff*«r»4  •rrr*  •  I)  4  3? 


?!  *rmt  -  M  *  8* 


A"  /  1  v 

1  ;  .  >'"  j  "I 


vjii/T  1 

J  r  n  \):K 


1'ig.  6  -  Random  Texture  on  a  Elliptic  Paraboloid 


iV'  1 

^  jii'J  A 


ifefi'. 

Ivi;  ;l 


Fig.  7  -  Random  Texture  on  a  Hyperbolic  Paraboloid 


CM 


253 


nr 


o 


100 

100 

©0 

&0 

80 

A 

80 

70 
60 
>  0 

T 

70 

60 

50 

V- 

40 

■  J  i/ 

40 

30 

30 

20 

20 

10 

10  - 

0 

-240  -ICO  -80 

1 

0 

c 

1  ’  1  1  1  .  1  *  1  '  1  *  1 

-240  -160  -80  0 

a)  1  ii.,1  itt-rutK  [i.  luu  resolution  J)  First  iUntion,  high  resolution 


o  - 


l  '  *  '  i  ‘  '  1  I  ‘  1  1  J 

>240  -tfco  -ao  o 

a)  h»rst  iteration,  low  resolution 


u 

10 

© 

e 

7 

« 

6 

i  v/ViVf  V^^v-vv  ^  / 

3  i  | 

?-  U 

o 

I  *  •  •  I  *  *  ‘  ,  ‘  1  •  j 

-340  -160  -80  0 

d)  First  iteration,  high  resolution 


Nj 


I.)  S.  v-.uul  iit  i ;iiu >n,  !  .v  resolution  c)  Second  iteration,  high  resolution  *>)  Second  iteration,  low  resolution  e)  Second  iteration,  high  resolution 


i  i  i . , ,  !  it  er;»t  i»  'ii,  I  -  a,  resolution  0  Third  iteration,  high  resolution  c)  Third  iteration,  low  resolution  f)  Third  iteration,  high  resolution 


I  ig  *  -iVfv«*;.lagc  um  jurly  matched  features  for  single  neighborhood  Fig.  0  -  Residual  for  single  neighborhood. 

V,  rmal  axis  -  percentage  of  points  unh  match  Vertical  axis  -  mean  absolute  disparity  error  in  pixel.-. 

Horizontal  axis  -  disparity  offset  in  |  .  Horizontal  a*us  -  disparity  offset  in  pixels. 


G93=ic>  -3i 


Evidence  Combination  for  Vision  using  Likelihood  Generators 

David  Sher 

Computer  Science  Department 
University  of  Rochester 
Rochester  NY  14627 
November  7.  1985 


1.  Objfttitts 

My  project  ts  about  building  r<-agr  understanding 
systems,  in  pamcular.  1  am  studying  how  to  design  a 
system  that  Lakes  a  color  image  from  a  camera  and 
finds  the  outlines  of  the  objects  in  the  image.  A 
representation  of  outlines  is  often  called  a  segmenta¬ 
tion  of  the  image.  The  task  of  generating  a  segmenta¬ 
tion  is  called  image  segmentation. 

Outlines  are  extremely  useful  for  other  low-level 
vision  tasks.  The  algorithms  that  benefit  most  from 
knotting  outlines  are  algorithms  that  are  based  on  a 
local  continuity  or  planarity  assumption  The  standard 
relaxation  based  stupe  from  shading  [12)  and  optica! 
flow  [4}  (5)  algorithms  arc  examples  of  such  algorithms. 
Intermediate  level  vision  algorithms  have  been 
developed  to  take  outlined  regions  and  associate  them 
with  known  objects  [II]  [15]  ITius  the  task  of  image 
segmentation  can  be  considered  important,  even  fun¬ 
damental.  to  computer  vision  hew  algorithms  (other 
than  segmentation  algorithms  of  course)  can  be  run 
effectively  on  an  unsegmented  image 


The  objective  in  this  project  s  to  build  programs 
that  can  take  expert  knowledge  about  the  appearance 
of  outlo.es  in  images  and  derive  a  segmentation.  Any 
knowledge  about  the  structure  of  images  should  be 
represeniable  in  my  framework.  Using  the  expert 
knowledge,  ihe  system  should  return  the  most  useful 
information  about  the  outlines  of  the  objects  in  the 
image  possible. 

Much  of  the  work  done  in  computer  vision  has 
been  developed  with  different  goals  in  mind.  Because 
of  the  difference  in  goals  the  algorithms  some  people 
developed  have  serious  shortcomings  from  my 
viewpoint.  One  alternate  set  of  objectives  is  those  held 
by  researchers  inspired  by  biological  modeling.  An 
excellent  work  in  biological  modeling  is  that  of  Fleet 
|9],  His  work  is  on  the  temporal  and  spatial  characteris¬ 
tics  of  center  surrounds  operators. 

When  work  in.’  on  modeling  one  tr.es  to  develop 
algorithms  whose  behavior  closely  approximates  that  of 
a  human  vision  system.  An  example  ol  such  approxi¬ 
mation  is  to  have  only  band  limited  operators  because 
the  cells  on  the  mammalian  optic  nerve  have  been 
shown  to  be  band  limited.  For  my  work  this  limitation 
is  not  sufficient  reason  to  use  exclusively  band  limited 
operators.  If  it  is  shown  that  the  phenomena  that  I  am 
trying  to  detect  are  hand  limited  or  that  a  band  limited 
operator  is  sufficient  to  detect  the  phenomena  without 
loss  of  accuracy  then  I  would  use  band  limited  opera¬ 


tors. 


I  claim  that  'Tic  cntc--  n  -n  most  m.  t, -yiicuJ 
m< -deling  studies  is  to  o-me  up  with  simple  and  cw!. 
-irj!.'cd  . 'per  it-  -*v  «hnca",.ics  n  -''c'j:  rs  mus-' 
Nr  bv  sN 'wing  some  fur  rvcjrti  n  the 

NrhaU'-r  .'I  trc  ">h  i-  s.Mcm  'flat  .an  a><  ne 

in. -deled  -vi'h  hue  f  'pcrai.-rs  K-r  m.  work.  I  have  to 
just:!.  linear!)  in  an  'pcralor  N  demonstrating  wire 
linearilv  in  '.he  wor'd 

Another  viewpoint  of  workers  in  computer  vision 
is  derived  hum  working  on  signal  processing  Signal 
processing  shares  mans  of  the  objectives  of  computer 
vision  Signal  processing  developed  as  a  discipline  or. 
ihc  development  of  radar  and  sonar  Signal  privessing 
predated  electronic  computation  as  we  know  it  ’aula) 
It- as  muen  of  the  work  in  s’giul  processing  is  on  the 
behavior  of  ('.electors  that  can  he  made  from  simple 
electron!"  circuitry  Many  optimo/ui  results  in  signal 
processing  jrc  only  for  linear  operators 

Signal  analysis  originated  with  analysis  of  a  single 
time  varying  signal  and  many  of  ns  results  apply  only 
to  such  a  model  Such  models  assume  that  images 
have  properties  common  to  lime  .ary mg  signals  that 
irr.  ig.-s  don  t  share  like  causality.  When  a  signal  has  a 
caus.i!  model  (in  work  in  signal  analysis)  'hen  the  best 
prediction  of  the  value  of  a  signal  at  a  point  can  be 
made  without  knowing  the  behavior  of  die  signal  in 
the  future  only  the  past  behavior  However,  it  is  not 
true  that  the  left  side  of  the  image  predicts  ihe  values 
of  the  middle  ignoring  the  right  side.  Despite  this  fatt. 
image  processing  operators  have  been  recently 
described  that  assume  such  causality  [f-|. 

2.  Kvpertis** 

Mv  project  studies  image  interpretation.  An 
imaee  interpretation  system  takes  assumptions  ab- >ul 
the  structure  of  the  scene  being  observed  and  uses 
them  in  combination  with  the  ima  to  generate  an 
interpolation.  Ihe  original  source  of  assumptions  is 
the  human  progiammer  \1>  project  is  about  taking 


nt.  -r'latk  n  'rim  human  evicts  on  -.ns  age  pr.  cessing 
and  Reliving  a  y.stem  '.r.at  . o  ‘.rvi  mlormation  for 
rr age  ntuT'eta'a  n 

Poo-  are  a  .ar:et.  i  '...r-.es  I  expertise  m 
image  pr.  xiss-rg  (  ’.'c  are  m  ic  vmius  -  1 , e"l  o 

at  '.he  '  -w  le'C  ip*  eesvrg  a-ne  be)-,  -c  Mali. -n t 

•han  at  tpc  .eveis  rpt.wewiog  that  assumes  seg¬ 

ment  jIh  nr  wfu-h  is  wbv  mv  project  >.,«*.  cnl  rates  on 
the  u-w  loci  s.Mcms  S--rr.c  knowledge  rs  jfxrat  typi¬ 
cal  arrangemmts  .  I  or>)«\ls  in  the  wi>rld  md  typical 
cok-ritwns  tXncr  knowledge  is  about  the  physics  of 
obset-ation  Such  knowledge  can  be  used  to  build 
boondarv  detectors 

Octicr  sources  of  knowledge  aie  the  1  fef all -rv 
designed  to  detect  vativ»us  features  ot  images.  I  intend 
u>  incorpv  rate  the  knowledge  that  *em  into  the  con¬ 
struction  if  such  operators  into  my  system  A  problem 
with  such  iiperalors  is  that  the  assumptions  that  wete 
used  to  build  them  are  often  implicit  and  unrecognized 
even  by  die  author  of  the  operator  Thus  part  of  my 
project  is  devoted  to  reco.enng  the  assumptions 
behind  known  opera'ors  Such  information  is  useful 
outside  the  context  of  m>  projcct. 

fo  understand  how  to  turn  knowledge  into  opera¬ 
tors  or  analy/e  operators  for  the  knowledge  within 
them,  it  helps  to  know  certain  facts  about  expert 
knowledge  on  image  understanding.  fcxpert 
knowledge  is  companmentah/ed.  Knowledge  exists 
jbout  how  to  use  shading  to  derive  surface  orientation 
[12].  Knowledge  exists  about  how  to  derive  surface 

<3) 

orientation  from  texture  in  the  imagc>|14|  [13)  (1).  But 
little  work  has  been  dune  to  derive  surface  orientation 
fiom  b*  th  shading  and  texture  e»en  though  both  exist 
in  most  images,  ihu,  an  image  interpretation  system 
has  to  derive  information  irum  a  vanciy  of  knowledge 
sources  whose  expertise  is  limited  to  j  particular  set  of 
situations  that  use  part  of  the  information  in  the 
image 


The  image  interpretation  system  that  I  am  b_i!d- 
•rg  contains  a  set  of  boundary  detectors  (die' eloped 
using  different  sources  of  e*  peruse).  I  have  developed 
uxitroque*  for  -rurjinamg  vets  of  cjcuxioo  into  a  s>s- 

tern  for  dr-irciu'g  features  such  as  boundaries. 

Che  relationship  between  an  image  and  the 
,m sged  scene  is  not  one  to  one  (Tiere  are  mans 
scenes  that  tan  cause  an>  particular  image  Ihus0 
images  are  inherent!)  ambiguous  Ambiguity  makes  it 
unreasonable  to  e»pcct  to  dense  a  hounrtars  detector 
that  gives  as  output  a  d-cision  of  w hethcr  a  boundar) 
exists  o'  ..  ..  at  a  point  A  probability  of  the  boundar) 
existing  is  a  more  reasonable  output.  Thus  the 
detectors  that  arc  used  bs  ibr  image  interpretation  sys¬ 
tem  in  m>  pr.iictt  have  probabilities  as  output 

*  ivorirs 

I  »n  theories  are  required  for  m>  project 
(Ihc  >r>  is  used  here  in  the  sense  of  a  body  of 
knowledge  and  techniques  such  as  quantum  theory  )  I 
need  ways  ui  take  human  expert  knowledge  on  a 
specific  subject,  such  as  the  probability  of  boundaries 
gtsen  particular  texture  edges,  and  generate  an  algo¬ 
rithm  that  calculates  the  probability  of  some  feature  of 
an  image  such  as  the  prooability  of  an  object  boundary 
in  a  part  of  that  particular  image  (feature  will  be  pie- 
era.' I>  defined  in  further  sections).  Such  a  theory  is 
a  theory  of  fealL  •  detection.  The  other  theory  is 
about  combining  the  information  generated  by  a  set  of 
feature  detectors  for  the  same  feature.  The  detectors 
arc  based  on  different  assumptions  or  take  into  account 
different  data.  My  evidence  theory  is  about  combining 
the  output  of  a  set  of  detectors. 

4.  Definitions  of  Fundamental  Terms 

The  conceptual  universe  contains  not  only  all 
observations  but  ail  structures  used  by  various  models 
(or  domains  as  explained  laier)  to  explain  and  model 
the  observations.  One  such  structure  is  an  array  of 
boundary  features  (feature  is  defined  in  the  next  para¬ 
graph)  that  are  1  when  a  boundary  passes  through 


their  window  in  the  image  and  0  otherwise  In  this 
wort  I  take  the  reasonable  pink>snphioa,'  position  that 
there  is  a  real  world  being  imaged.  The  conceptual 
universe  is  a  represent. of  relevar,.  aspects  of  this 
world  and  of  the  image 

A  feature  of  rhe  conceptual  universe  rs  either  a 
particular  obver  anon  of  a  parameter  of  a  modei 
(d.>main)  used  U)  understand  the  observed  image. 
Fcaiu.es  lake  on  values  ar.d  a  feature  x  value  ts  useful 
m  understanding  the  scene  being  obsenc-d.  Vxe  that 
in  signal  analysis,  btotogicai  modeling,  and  own  peter 
vision  feature  is  used  in  a  more  rcsirtct'r  sense,  to 
refer  ui  the  output  of  functions  applied  to  the  iihseneu 
image  Ifchile  my  definition  of  feature  dew  not 
conflict  with  tins  usage  it  is  an  exienvam.  A  feature 
spore  is  a  set  of  values  that  a  feature  of  ibe  concepiuaJ 
universe  can  ailain 

An  example  of  a  feature  is  a  bourdary  feature  x 
bouruian  feature  takes  on  the  value  iwrundarv  *r, 
there  is  a  region  bcHindary  passing  through  -.he  pan  - 
the  image  assigned  to  il  Otherwise  it  has  the  value  no 
rvxjiuiary  A  boundary  feature  s  feature  space  is  the  set 
iiviuneUr'  rwi  i>m.n4ar»)  Another  kind  of  feature  is  a  sur¬ 
face  orientation  feature  that  'akes  on  values  according 
to  the  slant  and  tilt  of  'he  curtate  of  the  object  imaged. 
Its  feature  'pace  is  a  set  of  ordered  pairs  of  numbers. 
Another  kind  of  featur*  is  the  intensity  observed  at  a 
point  in  the  image,  an  observed  intensity  feature. 

Features  are  generally  associated  with  points  in 
either  the  image  or  the  three  dimensional  scene.  In  my 
work  I  concentrate  on  features  associated  with  the  two 
dimensional  image. 

hvery  image  representation  in  a  computer  must 
consist  of  a  finite  set  of  numbers  (since  an  infinity  of 
numbers  is  too  expensive  to  store).  Thus  every 
representation  of  the  conceptual  universe  in  a  com¬ 
puter  must  be  discrete.  However  the  underlying 
model  of  ihe  scene  could  be  continuous  and  much 
vision  and  image  reconstruction  work  is  done  on 
discrete  approximations  of  continuous  models  [16|  (3|. 


25? 


M>  enure  conceptual  universe  including  the  models  is 
discrete  Ihus  ever’. thing  in  mi  svslcm  tan  ft  directly 
rcp  resented  in  iny  '.stem  ihi>  eases  evidence  combi- 
n.ition 

5.  lUnnjins 

In  j  previous  ve-cll'-n  I  tlottibcu  certain  pr. '|-vr- 
tics  ol  evpert  crtowlnlgc  A  domain  is  a  t'irmal  dcviee 
'■  r  manipulating  cxnctl  knowledge  A  detain  is  a  set 
•  l  jx»  pis  (logical  statements)  that  taken  in  conjunction 
describe  a  hmjv  of  evpert  knowledge  A  domain  tan  he 
considered  a  kvgieal  stalcmen'  and  manipulated  the 
sarnie  wjv  A  iiiuiMun  is  an  assignment  of  values  to 
features  if  the  conceptual  universe  A  domain  is 
euuivjlint  to  the  vti  of  situations  that  do  rxK  c>»ntrad- 
itt  the  av  o*t  ,  m  the  tiomain  Ihus  a  domain  tan  he 
rnanipuiat.’d  m  the  same  wav  as  a  set 

An  t‘  unple  of  a  domjm  (tailed  />  I  is  the  set  of 
statements  he  low 

(ll  Ihe  image  is  of  a  set  of  rcg'ons  of  uniform 
reflectance. 

( ’)  hath  region  has  a  finite  extent  wth  a  well  defined 
h  uindafv 

(t)  hath  boundary  consists  vif  j  linile  set  'if  .aimers 
with  turves  of  low  curvature  between  them. 

(4)  hath  surface  hjs  slow  Is  var>ing  surface  orienta- 
lion 

(5)  All  surfaces  ate  illuminated  from  the  same  direc¬ 
tum  with  uniform  intensity. 

Ihe  domain  above  is  realistic  but  rather  complex. 
A  simpler  domain  has  been  used  in  m>  work  so  far 
and  described  later  on. 

A  subset  of  n  is  a  domain.  />-,  that  assumes  ihe 
same  as  above  and  that  all  the  surfaces  are  perpendicu¬ 
lar  to  the  line  of  sight.  /)>  applies  to  a  subset  of  the 
situations  />■  applies  to.  If  the  axioms  of  D;  are  taken 
in  conjunction  with  the  axioms  of  0.  and  the  result  is 


b-  Mario*  Kando*  f  ields 

In  the  domains  I  am  using  I  assume  that  the  ;.*o- 
h.ibiht)  of  teat u re  values  can  be  determined  exactly  if 
I  know  the  values  ol  j  set  ot  neaihy  features  One 
such  .tssumption  can  he  described  thus  A  houndarv 
feature  is  a  associated  with  a  window  n  ibe  image  It 
attains  the  value  •s-vm.iun  when  j  boundji>  passes 
through  thal  windaw  Otherwise  it  attains  the  .alue  rsi 
•s»inaai>  If  I  know  ihe  values  ol  trie  pixels  of  the  win¬ 
dow  associated  with  a  boundary  feature.  Mu*,  and  the 
values  of  the  fxiundary  features  with  areas  around 
M  i  ■  i  then  I  could  ikterminc  the  probability  of  thal 
Niundury  fealurc  taking  on  the  value  s.«jnjarv  indepen¬ 
dent  o|  the  values  jttamed  by  the  rest  of  the  features 
in  the  conceptual  universe.  Such  an  assumption  of 
locality  is  made  in  most  of  the  work  in  vision  and  sig¬ 
nal  process.ng.  for  each  feature  there  is  a  set  of 
features  that  dcterriiPc  its  probability  distribution. 
Vhis  set  defines  a  function.  where  f  is  li.c  set  of  all 
features  in  the  conceptual  universe: 

A  / 

Ihe  combination  of  /  and  S  describes  a  neighborhtxx) 
system  on  f  A  set  with  a  neighborhood  system  is  a 
Inline.  A  selling  of  a  subset.  S.  of  1  (such  as 
't/C/i)  is  an  assignment  of  values.  .€1.  to  the 
members  of  / .  It  is  a  function: 

j  V  -*  i 

A  setting  can  also  be  described  as  an  assignment  of 
values  to  s  from  I  It  also  can  be  considered  a  set  of 
ordered  pairs:  ti€s.i*  wnere  every  member  of  V 
appears  on  the  left  once  and  only  once. 

Ihe  domains  I  use  describe  a  function  p,  that 
takes  a  setting  uC  it,,,  on  s « /  >  and  returns  a  probabil¬ 
ity  distribution  on  the  feature  space,  v,  of  /.  In 
mathematical  notation: 

<SI/  1-4'/  -MO  D 

(he  set  of  p,  combined  witii  the  lattice  described  by  V 
and  i  describe  a  Markov  random  jidd  Ihe  domains 
I  am  using  imply  Markov  random  lields.  (Actually  any 


domain  sufficient  to  specify  p,  is  tn'uJIy  a  Markov 
random  field  where  M'  it  but  in  the  eases  I  am 
interested  Vi;  i  is  small  relative  to  t  ) 

Markov  random  fields  have  been  used  as  a  model 
for  e  variety  of  image  restoiaiion  algorithms  (t(J|  PI 
and  for  recognizing  tcvlure  (f>I- 

7.  Hidden  a ad  Observed  features 

An  image  analysis  system  often  hss  the  goal  of 
deriving  implicit  or  "hidden"  scene  parameters  from 
the  observed  data.  Ihcrc  may  be  other  hidden  param¬ 
eters  that  are  not  wanted  but  are  useful  as  intermediate 
results.  Assume  an  imaging  system  needs  u>  determine 
where  the  li'ics  are  in  an  observed  intensity  image. 
Ihe  lines  are  uie  desired  hidden  parameter  while  the 
intensity  image  is  the  observed  data.  Often  the  edges 
in  the  image  are  useful  in  determining  where  the  lines 
are  Ihe  edge  configuration  is  also  a  hidden  parame¬ 
ter  The  models  I  am  using  for  the  data  consist  of  a 
network  of  feaiures.  Some  of  them  take  on  values 
according  to  observed  data.  Such  are  observed  features. 
The  other  feaiures  represent  structures  in  the  image 
that  are  not  directl>  observable.  Such  ire  hidden 
features. 

8.  feature  Detectors 

Most  human  expertise  about  low  level  image 
interpretation  can  be  described  in  terms  of  feature 
detection.  A  feature  detector  is  an  algorithm  that  out¬ 
puts  information  about  the  value  of  a  particular 

feature.  Such  information  translates  to  (or  is)  a  proba¬ 
bility  distribution  over  the  feature  space  of  the  feature. 
As  an  example  a  boundary  detector  returns  the  proba¬ 
bility  that  a  boundary  passes  through  an  area.  For 
each  point  in  the  image  there  is  a  feature  that 
represents  the  existence  or  nonexistence  of  a  boundary 
at  a  point.  If  the  boundary  detector  returns  p  for  an 
area  then  the  corresponding  boundary  feature  has  a 
distribution  of  (p.I-p)  on  the  values  (boundary, no  boun¬ 
dary).  A  feature  detector  takes  as  input  the  values  of 


Uie  observed  features  and.  poaTIv,  estimates  of  the 
distributions  on  neighboring  features.  Mint  esta¬ 
blished  feature  detection  algorithms  can  he  described 
in  such  a  manner 

Note  that  from  a  lattice  of  hidden  and  observed 
features,  and  from  a  feature  detector,  a  domain  can  be 
derived,  fhat  ri^mam  simply  stales  that  the  probabil¬ 
ity  distribution  of  a  feature,  given  a  setting  of  m 
neighborhood,  is  what  the  feature  dcuxtoi  says  it  ts.  A 
useful  domain  implies  a  feature  detector  also. 


®  likelihood  Operators 


Often  it  is  easier  to  state  arc',  solve  the  inverse 
vision  problem  (which  ts  why  cortv-it-r  graphics  can 
generate  realistic  images  that  current  image  under- 
su/.ding  systems  can  t  analyze).  It  tray  be  easier  u> 
describe  the  probable  structure  of  an  observed  inten¬ 
sity  image  in  the  presence  of  a  boundary  than  to 
describe  the  probability  distribution  on  the  boundary 
given  an  observed  image.  Ihe  probability  that  the 
observed  features  arc  assigned  according  to  a  setting  a 
when  a  hidden  feature,  t ,  lakes  on  value  *  is  the  likel¬ 
ihood  o(  *  for  j.  I  use  as  short  hand  notation  for  this 
/ ,  ( 1 1  v '.  A  likelihood  generator  is  an  algorithm  that 
uses  a  domain  t)  to  estimate  the  likelihood  of  v  for  <j. 
Thus  I  use  as  notation  for  the  output  of  a  likelihood 
generator  as  /Y(<j|v<t«).  Given  a  likelihood  generator 
for  I)  and  a  prior  estimate  of  the  distribution  of  /’ s 
values  then  one  can  make  a  feature  detector  for  t  using 
Bayes'  Rule: 


t  lta\.il))prwrltv) 
'J&L  v  AI))pr"jrf(v') 

.  i  v 


(1) 


I  call  the  feature  detector  thus  derived  a  Bayesian 
feature  detector  for  domain  I). 

The  set  of  likelihoods  for  a  feature  /  given  an 
observation  a  contains  more  information  than  (1)  uses. 
The  denominator  in  (1) 


2  /  ,(</  |  v '.I  I)  )priur,h  ) 


(2) 


L’59 


is  the  probability  that  a  would  ircur  gi.cn  the  prior 
estimate  of  the  distnbul* >n  on  ■  s  Icuiure  space  II  the 
probability  is  too  low  then  the  domain  being  used 
prohaol,  is  not  correct.  I  use  this  information  com¬ 
bined  with  J  prion  inlormjlion  about  the  leliahilitv  ol 
the  domain  to  deme  an  evidence  ihcorv  further  on 

It).  Fsidence  I  hroo 

I  he  fundamental  questum  I  address  is  how  to 
combine  two  feature  detectors  hath  feature  detector 
implies  a  domain,  lo  dense  a  combined  feature 
detector.  I  need  lo  show  how  to  combine  two  likeli- 
hixid  generators  gisen  some  prior  infonnation  on  the 
two  domains  and  their  intersection.  If  I  can  combine 
the  output  of  two  likelihood  generators  I  cun  use  ihc 
combined  generator  to  make  a  feature  detector  using 
Bases'  rule.  Then  I  base  a  feature  detector  that 
returns  useful  data  when  either  of  the  two  domains 
apply.  In  ihis  section  I  use  (),  to  represent  a  setting  of 
the  features  other  than  /. 

10.1.  Combining  Likelihoods 

If  I  have  a  likelihood  generators  for  two  domains. 
/),  and  l>7.  I  would  find  useful  a  likelihood  generator 
that  works  for  either  domain.  I  call  the  domain  that 
holds  sshen  either  I),  or  l>:  holds  l),ui) In  this  sec¬ 
tion  I  show  how  to  make  such  a  likelihood  generator 

I  as>umc  thai  I  can  dense  or  know  certain  infor¬ 
mation  a  p'iori.  The  a  prion  information  is  the  proba¬ 
bilities  of  it,,  and  /M/>.  holding.  One  source  of  j 
priori  knowledge  is  statistics  acquired  by  human 
interpretation  of  a  test  suite  of  images.  Other  sources 
could  be  a  model  of  the  scenes  expected  to  be 
obsersed  by  the  imaging  system. 

I  also  assume  that  the  a  priori  probability  of  the 
feature  taking  on  any  salue  is  the  same  under  any  of 
ms  domains  or  combinations  thereof.  I  hus  all  the 
domains  have  the  same  prior  bias  regarding  the 
feature.  Given  such  assumptions  I  can  dense  a  combi¬ 
nation  rule  for  likelihoods: 


/<<!;.  jn  iht/I  i 

/  n>  ..Ml  »/'l/M 

I  hi  I  ,  Ml  All-iPtll  ill. \ 

I  III.  •  Ml)  I)  ||  - 

/* i  / >  n  a  t  riii  a  'i  > 

n  is  an  assignment,  n  I  •  —  l  .  ol  all  the  tcjlurcs 
besides  1 

f-qualion  (l>  contains  a  new  term 
i  n:  |  >4 n. m. i  which  is  the  likelihood  generator 
wh<*s«r  domain  is  the  conjunction  of  the  axioms  of 
and  /)•.  In  the  special  case  where  the  two  domains  are 
disjoint  (pm  a />.»=<>).  /  ,ki.  | .in  i/i  i  is  multiplied 
by  0  so  is  irrelevant.  Here,  the  output  of  the  com¬ 
bined  likelihood  generators  is  the  weighted  aserage  of 
the  outputs  of  the  two  other  generators. 

An  example  of  two  disjoint  domains  is  one  that 
assumes  that  objects  in  the  are  world  are  loimbertian 
surfaces,  and  another  that  assumes  that  the  objects 
have  specular  reflectance  properties.  Both  domains 
can  not  be  simultaneously  true  at  any  point  in  die 
image  (they  both  can  be  simultaneously  false  though). 

Another  example  of  two  disjoin  domains  are  one 
domain  that  assumes  gaussian  additive  noise  of  stan¬ 
dard  deviation  4±r  and  another  that  assumes  gaussian 
additive  noise  of  standard  deviation  of  8±r.  Both 
domains  can  not  hold  simultaneously.  I  use  these 
domains  in  experiments  that  test  the  effectiveness  and 
properties  oT  my  evidence  combination  rules  on  real 
images  (see  further  on). 

If  one  domain  is  a  subset  of  the  other  ihc-i  the 
output  is  that  of  the  superset  since  the  feature  detector 
for  the  superset  is  presumed  to  have  already  taken  the 
subset  into  account.  If  I) t  and  /i;  have  have  the 
independence  properly  described  by  equation  (4)  then 
the  output  is  bilinear  in  the  output  of  the  two  feature 
detectors. 

1,11),  |  ail)  .JO.)  =  y(l), ./),)/, (ft,  |  w<0,)/,(0  |  w</)?|<4> 


Conditional  independence  satisfies  (4>  with  ri/z../;o 
always  equal  to  1.  Bilinearity  is  a  somewhat  more 
flexible  cnteriop  than  that  of  being  disjoint  since 
ft l>  0  p'l  means  that  the  two  domains  ere  disjoint 
W  ficn  the  domains  are  not  disjoint  or  bilinear  the 
filiation  is  dealt  with  as  a  special  case.  (Perhaps  tech¬ 
niques  for  dealing  with  cases  where  the  conjunction  of 
domains  is  s'gmlicant  will  be  developed  during  my 
research.) 

Note  that  the  contribution  of  each  likelihood  gen¬ 
erator  is  proportional  to  the  a  priori  probability  of  its 
domain  holding.  Ihe  contribution  of  the  likelihood 
generator  is  also  proportional  to  the  value  in  equation 

(2)  which  corresponds  to  the  reliability  of  the  operator. 
Thus  '>ikelih<xKl  generators  use  the  information  in  the 
image  about  their  own  reliability  when  being  com¬ 
bined. 

10.2.  l-rcd  and  the  Thermometer 

To  illustrate  my  evidence  theory  I  would  like  to 
introduce  a  simple  case  originated  by  Glenn  Shaftr.  It 
is  about  detecting  the  temperature  outside,  given  two 
sources  of  expertise.  There  is  a  thermometer  visible 
outside.  Fred  has  just  walked  into  the  room,  here  m 
Rochester  NY.  Fred  is  an  academic  type.  About  20% 
of  the  time  he  is  in  his  own  world  and  thus  his  state¬ 
ments  have  no  undertandable  connection  to  the  out¬ 
side  world.  When  it  is  freezing  he  usually  doesn't  com¬ 
ment.  When  it  isn't  hn  comments  on  it  about  50%  of 
the  time.  The  thermometer  is  a  standard  outsit  e  ther¬ 
mometer  and  thus  subject  to  break  down  There  is 
about  a  5%  chance  that  the  thermometer  is  broken. 
The  question  is  whether  it  is  freezing  outside.  Fred 
says.  "Its  nice  and  warm  outside."  The  thermometer 
reports  29  degrees  Fahrenheit  (approximately  -2  C). 
O.i  days  this  time  of  year  (fall)  theres  a  50-50  chance 
of  freezing  weather.  Table  1  shows  the  likelihoods 
generated  by  Fred  and  the  thermometer. 


I  able  I  :  Likelihoods  for  Freezing  Weather 

Fvent  Fred's  Thermometer's 

Freezing  Weather  0.00  l  .00 

Warm  Weather  0.50  0.00 

There  are  two  domains,  the  Fred  domain  and  the 
Therm  domain.  The  information  from  the  domains 
about  the  weather  is  shown  in  Table  1.  Note  that 
there  is  no  information  about  what  happens  when  all 
of  my  information  sources  are  :wX  reliable.  Currently 
my  theory  assumes  that  this  nc  *r  happens.  I  adjust 
my  evidence  theory  to  deal  with  the  possibility  of  total 
unreliability  in  a  later  section. 

Because  I  ignore  (he  possibility  that  everyone  is 
unreliable  if  only  Freds  evidence  is  available  then  I 
am  100%  sure  of  warm  weather.  If  only  the  thermom¬ 
eter  is  seen  I  am  100%  sure  of  freezing  weather.  Thus 
I  have  two  high  confidence  evidence  sources 
conflicting. 

I  believe  these  Wings  about  the  relationship 
between  the  two  domain:: 

(1)  The  Fred  domain  is  80%  reliable. 

(2)  The  Therm  domain  is  95%  reliable. 

(3)  The  reliability  of  the  Fred  domain  is  independent 
of  the  Therm  domain.  Thus  both  domains  are 
reliable  76%  (95%  of  80%)  of  the  time. 

(4)  The  intersection  between  the  Fred  domain  and 
the  Therm  domain  is  0  when  we  make  the  obser¬ 
vations  we  have. 

This  information  about  my  domains  is  sufficient  for  ire 

© 

to  apply  the  likelihdbd  combination  rule. 

I  can  combine  the  likelihoods  from  my  two 
domains  using  the  likelihood  comoination  rule  denied 
in  the  preceding  section  (F'quation  (3)).  I  need  to  han- 
ole  the  case  when  both  Fred  and  the  thermometer  are 
in  working  order.  The  statements  of  Fred  and  the 
thermometer  contradict.  But  these  statements  are 
impossible  if  Fred  and  the  thermometer  are  giving 
accurate  readings.  So  the  operator  when  Fred  and  the 
thermometer  are  operational  returns  0. 


Thus  by  equation  (3)  the  livelihood  of  freezing 
weather  is  the  weighted  sum  of  the  likelihoods  output 
by  bred  and  the  thermometer  weighted  by  the  proba¬ 
bility  that  they  are  accurate  and  the  result  a.  ided  bv 
their  probability  of  their  accuracy  minus  the  probabil¬ 
ity  that  both  ate  accurate.  Thus  the  likelihood  of 
freezing  weather  is: 


0n0*l).HO»l.lK)*0.QS-()  UO*i)  > 

0  80*0  '«-<)  ’*!  ■  ^ 


Thus  the  likelihtxxl  of  freezing  weather  is  0  %  .  Fqua- 
iion  (3)  applied  to  warm  weather  looks  like: 

0  50*0  slkO  00*0  95-0 (XT  0?b  _ 

■)  »<MJ  9S-0  7fc 

ITius  the  likelihood  of  warm  weather  is  0.40  . 

If  I  now  apply  Bayes  rule  to  these  probability 
with  my  50-50  prior  I  get  a  0.70  probability  that  the 
weather  is  freezing. 


I0..3.  Taking  l  ncertainty  into  Xccount 

In  the  example  of  "bred  and  the  Thermometer.”  I 
calculated  the  likelihoods  and  conditional  probabilities 
when  at  least  one  domain  has  ns  assumptions  met  I 
have  up  to  now  ignored  the  possibility  that  none  of  my 
domains  may  apply.  In  this  section  I  show  how  to 
take  this  possibility  into  account  in  my  system. 

I  represent  the  possibility  of  none  of  my  domains 
holding  by  another  domain  n ,  that  represents  the  case 
where  I  am  totally  ignorant  of  the  structure  of  the 
universe.  Here  I  use  maximum  entropy  like  tech¬ 
niques  [Max  entropy  rell  to  make  the  likelihood  gen¬ 
erator  for  this  minimalist  domain. 

For  /j/s  likelihixid  generator,  !,{<),  ] -.A I),),  a 
gtxid  function  is  the  one  that  preserves  die  prior.  I'hat 
is  the  posterior  found  using  a  prior  is  the  same  as  thi. 
prior.  Such  a  set  of  likelihoods  act  in  accordance  with 
a  naive  conception  of  ignorance.  The  generator  that 
preserves  the  prior  assigns  equal  probability  to  all  o 


(maximum  entropy  assumption).  The  conditional  pro¬ 
babilities  must  sum  to  1.  Thus  the  likelihood  generator 
for  /  ,tO,  |  /),n  has  a  well  denned  value. 

10.4.  Fred  and  (he  Thermcnseter  with  Ignorance 

Both  bred  and  the  Thermometer  were  quite  reli¬ 
able.  I  he  probability  of  ooih  of  them  failing  is  .01  I), 
givrs  equal  likelihtxxl  to  all  observations  given  either 
freezing  weather  or  warm  weather.  I  simplify  the 
situation  by  assuming  there  are  four  cases: 

(1)  breu  says  nothing:  thermometer  reports  below  32. 

(2)  bred  says  its  warm:  thermometer  reports  below 
32 

(3)  bred  says  nothing:  thermometer  reports  above  32. 

(4)  bred  says  its  warm:  thermometer  reports  above 
32. 

/),  gives  all  4  cases  likelihixvd  .25. 

Table  5  describes  how  the  likelihoods  change  for 
the  case  of  "bred  and  the  Thermometer”  when  the 
possibility  of  ignorance  is  taken  into  account. 

Table  5:  Fred  and  (be  Thermometer  with  Ignorance 

k'ent  Ignorance  Domain 

Without  With 

breezing  Weather  0  %  o.95 

Warm  W'eather  0  40  0  40 

There  is  not  much  difference  between  the  two  because 
I  have  high  confidence  in  both  bred  and  the  thermom¬ 
eter.  Using  50-50  as  my  prior  on  freezing  weather  I 
gel  .70  as  the  probability  of  freezing  weather.  I  here  >s 
no  dilTerence  to  the  nearest  .01  in  the  probability 
because  the  ignorance  case  is  unlikely.  The  domains 
used  for  analyzing  images  are  much  less  reliable  so  the 
effect  of  the  uncertainty  domain  is  more  prorounced. 

bor  a  more  dramatic  illustration  consider  the  case 
where  I  only  have  breds  input  but  can  not  see  the 
thermometer.  The  result  of  using  I),  here  -s  shown  in 
table  6: 


a  . 
* 


202 


? 

[ 

r 

y. 

v. 


Table  6:  Just  Fred  with  Ignorance 

F-ent  Ignorance  Domain 

Without  Witn 
Freeing  Weather  0  00  0. 10 

Warm  Weather  0.50  0.50 

Without  using  the  ignorance  domain  I  have  a  probabil- 
ii)  of  l.iW  for  warm  weather  given  just  Fred's  response 
because  I  have  to  ignore  the  possibility  that  he  is 
unreliable.  With  ignorance  and  a  prior  of  50-50  I  have 
a  probability  0.83  for  warm  weather  and  a  probability 
0.17  for  freezing  weather. 


i 


i 

i 


%'■ 

i' 


II.  I  lieory  of  Feature  Detection 

The  previous  section  shows  that  it  is  simple  to 
develop  an  evidence  theory  with  likelihood  generators, 
burlier  I  showed  how  to  derive  a  feature  detector  from 
a  likelihood  generator  and  a  set  of  priors,  i  called 
such  feature  detectors  Bayesian  feature  detectors.  This 
section  describes  techniques  for  development  of  Baye¬ 
sian  feature  detectors. 

1 1.1.  Developing  Feature  Detectors  Directly  from 
Domains 

A  benefit  of  Bayesian  feature  detector  theory  is 
that  it  is  possible  to  build  optimal  feature  detectors  for 
a  domain.  The  domains  I  use  have  axioms  that  when 

O 

combined  with  a  prior  on  the  feature  space  of  any 
feature  in  my  conceptual  universe  determine  a  unique 
posterior  on  the  feature  space.  A  function  that  calcu¬ 
lates  precisely  the  correct  posterior  probability  distribu¬ 
tion  on  the  feature  space  given  the  observed  image  and 
the  feature  is  the  optimal  Bayesian  feature  detector  for 
a  domain.  An  example  of  a  domain  and  its  optimal 
detector  is  shown  further  on. 


A  method  for  making  a  Baves.un  feature  detectoi 
starts  with  a  domain  for  a  leaturc  that  is  sufficiently 
specific  to  calculate  a  probability  for  any  observation, 
given  any  of  the  different  -alues  in  feature  space. 
Many  such  domains  can  be  garnered  from  the  image 
reconstruction  literature  [2).  Usual' y  in  such  a  domain 
the  feature  space  values  generate  with  some  probability 
a  set  of  possible  ideal  images  that  then  are  operated  on 
by  some  noise  processes  that  distort  the  result  When  a 
domain  is  of  this  form,  an  optimal  operator  can  be 
derived 

The  optimal  detector  is  often  too  computationally 
expensive  to  run  on  the  machine.  However  perfect 
accuracy  in  calculating  the  probability  distribution  is 
often  unnecessary  (and  usually  impossible  to  achieve  in 
any  case).  If  a  careful  analysis  is  made  of  the  degree 
of  accuracy  required,  an  algorithm  can  be  derived  that 
calculates,  without  undue  computational  expense,  a 
value  that  is  within  the  required  accuracy. 

I  have  built  a  Bayesian  feature  detector  for  a 
parameterized  set  of  simple  domains.  The  domains 
can  be  pictured  as  gray-level  images  of  shapes 
(regions)  where  these  assumptions  are  made: 

Region  Monochromaticity 

The  gray-level  of  a  region  does  r.ot  vary  within 
the  region. 

Duality 

Ihe  shapes  are  thick  enough  so  that  a  3  pixel  by  3 
pixel  window  can  only  fall  on  two  regions. 
Corners  are  considered  unlikely  enough  to  be 
ignored.  This  assumption  is  normal  (sometimes 
tacit)  in  line  finding  work. 

Noise 

The  scene  is  viewed  through  an  imaging  device 
that  introduces  a  noise  fac;or  into  its  evaluation  of 
the  value  of  each  point  according  to  a  known  set 
of  conditional  probabilities.  The  noise  is  a  param¬ 
eter  of  my  implementation. 


V  ,-V 


fcr-r 


r.‘> 
i-.’- 
r.  V 
c..' 


it,. 


2G3 


Histogram 

Ihe  a  priori  probability  of  any  pixel  inside  a 

window  being  a  certain  gray  level  (before  noise)  is 
known. 

(local)  f  rgixiicny 

Hie  imaging  device  mivegisters  pixels.  If  a  win¬ 
dow  falls  on  a  boundary  between  two  regions  die 
result  is  equivalent  to  random  selection  of  intensi¬ 
ties  from  the  two  regions  (before  noise).  If  the 
window  lies  in  the  interior  of  the  region,  ns  pixels 
are  all  selected  from  that  region  (before  noise), 
(lo  be  realistic  a  more  complex  assumption  is 
necessary  but  the  ergodicity  assumption  simplifies 
computation.) 

indow 

Hie  likelihood  for  a  boundary  feature  tan  be 
determined  entirely  from  the  values  of  a  3  by  3 
w  indow  of  observed  intensities  associated  wth  the 
feature.  ( t  his  is  another  assumption  that  is  not 
realistic  but  simplifies  calculations.) 

An  optimal  Bayesian  feature  detector  for  boun¬ 
daries  can  be  built  for  this  domain  by  building  opera¬ 
tors  that  calculate  I  and  /  <r>, \n»iH  i.  H 

represents  that  iheft  is  a  boundary  at  die  point 
represented  by  the  feature  (or  near  it),  o,  is  the  set¬ 
ting  of  the  intensity  image.  The  window  assumption 
says  that: 


//('>/!#>  = 

I  ,{0,\w>l  H)  -  I  ,tm J  )\m>t  H) 


('>: 


where  it '(/)  is  the  setting  of  the  window  about  /.  The 
window  and  histogram  assumptions  imply  that 
/ /(II '(/ hi n,u/i  )  can  be  calculated  by  integrating  over 
the  different  possible  gray -levels,  the  probability  ol  i he 
window  occurring,  when  it  is  completely  within  a 
region  of  that  gray -level.  The  histogram  assumption 
says  that  the  probability  of  a  window  being  seen  in  a 
tegion  of  a  particular  gray-lev  -l  can  be  calculated  as 
the  product  of  the  probability  of  each  pixel  being  seen 


in  a  region  of  that  color.  The  probability  of  each  pixel 
can  be  looked  up  in  a  list  of  conditional  probabilities 
according  to  die  noise  assumption,  lo  calculate 
I  ill  i/»|*)  it  is  necessary  to  integrate  over  all  pairs  of 
gray -lev  els  that  two  regions  can  be. 

The  algorithm  implied  bv  the  aKne  assumptions 
ts  somewhat  computationally  expensive.  In  particular 
iterating  through  all  pairs  of  colors  and  doing  9  multi¬ 
plications  and  1  add  for  each  pair  ressdu  in  an  algo¬ 
rithm  that  does  nearly  1.000.000  floating  point  opera¬ 
tions  at  each  point.  I  have  written  a  program  that  uses 
a  modified  version  of  this  algorithm  on  an  image  to 
calculate  the  probability  of  a  boundary  at  each  point. 

was  approximated  by  assuming  the  pixels 
resulting  from  the  brighter  region  all  have  higher 
intensity  than  die  pixels  resulting  from  the  darker 
region.  Kven  though  a  variety  of  simplifications  and 
approximations  were  used  to  generate  the  feature 
detector  4t  did  a  decent  job  of  determining  boundaries 
(comparable  to  ordinary  edge  deteciors  such  as  Sobcl). 

I  have  applied  this  feature  detector  using  die 
assumption  of  uniform  probability  for  regions  of  all 
gray  levels  and  gaussian  additive  noise.  I  assume  a 

prior  of  .2  for  a  boundary  passing  through  any  3x3 
window.  Kigure  1  shows  the  result  of  applying  my 
feature  detectors  to  artificial  images  assuming  diifering 
standard  deviations  for  the  noise.  I  use  standard  devia¬ 
tions  of  4,  8.  and  16  for  testing  purposes. 


Figure  1: 

Appl>ing  Approximated  Optimal  Operator 
to  Random  Checkerboards 

('■rightness  is  proportional  to  the  probability  of 
no  edge  (Black  means  edge.  White  means  no 
edge)  i  eft  most  column  is  an  artificial  image 
corrupted  -vith  gaussian  additive  0  mean  noise 
of  from  top  to  bottom  standard  deviation  0.  4, 

8.  and  16  out  of  25h.  The  rest  of  the  columns 
are  the  output  of  Bayesian  feature  detectors. 

The  moving  from  left  to  right  they  arc: 

The  combination  (according  to  the  evidence 
combination  rules  described  previously)  with 
equal  weight  the  three  detectors  whose  output 
is  shown  to  the  right 

(1)  The  Bayesian  feature  detector  that  assumes 
gaussian  additive  noise  of  standard  deviation.  4 
mean  0  out  of  256. 

(2)  The  Bayesian  feature  detector  that  assumes 
gaussian  additive  noise  of  standard  deviation  8 
mean  0  out  of  256. 

(3)  The  Bayesian  feature  detector  that  assumes 
gaussian  additi  :e  noise  of  standard  deviation 

16  mean  0  out  of  256. 

Figure  2  shows  the  result  of  applying  the  Sobel  edge 
operator  (thresholded)  to  these  artificial  images. 


Figure  2: 

Appi-ing  the  Sobel 
to  Random  Checkerboards 

Brightness  is  proportional  to  the  probability  of 
no  edge  (Black  means  edge.  White  means  no 
edge).  Left  column  is  an  artificial  image  cor¬ 
rupted  with  gaussian  additive  0  mean  noise  of 
from  top  to  bottom  standard  deviation  0.  4.  8, 
and  16  out  of  256.  Right  column  is  the  result 
of  running  Sobel  operato:  on  image  normalized 
into  [0.1] . 

Figure  3  shows  the  result  of  applying  my  feature 
detectors  to  a  real  aerial  photograph. 


Figure  3: 

Applying  approximately  optimal  operators 
to  aerial  photograph 


upper  left  hand  picture  is  the  image.  Bright¬ 
ness  is  proportional  to  the  probability  of  no 
edge  (Black  means  edge.  White  means  no 
edge)  upper  right  hand  picture  is  the  output  of 
the  combined  feature  detector,  lower  pictures 
are  the  output  of  the  approximately  optimal 
operators  for  gaussian  additive  mean  0  noise  of 
standard  deviation  i,  8,  and  16  out  of  256  from 
left  to  right  respectively. 

Figure  4  shows  the  output  of  the  Sobel. 


Figure  4: 

Appl>ing  Scbel 
to  acnal  photograph 

left  hand  picture  is  the  image.  Brightness  is 
proportional  to  the  probability  of  no  edge 
(Black  means  edge.  White  means  no  edge). 
Right  hand  picture  is  the  output  of  the  Sobcl 
normalized  to  [0,1], 

I  have  also  used  m>  evidence  combination  rules  to 
combine  several  Bayesian  features  into  one  that  is 
more  flexible.  Figure  1  shows  the  output  of  the 
feature  detector  that  represent  the  equal  combination 
or  the  three  feature  detectors  on  the  artificial  images. 
Figure  3  show;  the  result  when  the  combined  feature 
detector  is  applied  to  the  aerial  photograph. 

11.2.  Techniques  for  Deriving  Feature  Detectors  from 
Domains 

I  have  abstracted  four  techniques  for  reducing  the 
computational  costs  of  feature  detectors,  introducing 

some  inaccuracy  in  the  output  of  the  feature  detectors. 
The  techniques  are: 

(1)  Simplifying  the  Domain: 

Changing  the  assumptions  in  the  domain  to  make 
it  easier  to  analyze  and  build  detectors.  The  ergo- 
dicity  assumption  was  a  simplification  of  a  more 
complex  assumption  that  would  involve  windows 
with  only  a  corner  on  a  boundary  and  other  such 
complexities. 

(?)  Reducing  the  Scope: 

Reducing  Liie  data  that  needs  to  be  examined  to 
evaluate  the  detectors.  The  window  assumption 
did  that  in  a  explicit  way  by  limiting  the  amount 
of  the  image  that  needed  to  be  examined  to  deter¬ 
mine  the  probability  of  a  boundary  in  the  3  by  3 


window  to  that  window.  Actually  the  pixels 
around  the  window  also  might  be  sign'ficant  in 
determining  whether  a  boundary  passes  through 
the  window.  The  scoping  assumption  ignores  this 
possibility.  The  ergodicity  assumption  reduced  the 
scope  in  a  more  subtle  way.  As  a  result  of  the 
ergodicity  assumption  the  probability  of  a  boun¬ 
dary  can  be  determined  from  the  histogram  of  the 
window  ra  her  than  the  window  itself. 

(3)  Approximating  the  Feature  Detector: 

Using  standard  numerical  techniques  to  approxi¬ 
mate  the  optimal  feature  detector  with  functions 
that  are  computationally  cheaper.  As  an  example, 
assume  the  probability  of  a  boundary  at  a  point 
need  only  be  approximated  to  the  nearest  .1, 
under  gauss  an  additive  noise.  In  a  sample  image 
!  tested,  l  determined  that  more  than  90%  of  the  3 
by  3  windows  in  the  sample  image  had  probabili¬ 
ties  that  are  within  .1  of  0  or  1  from  the  range  of 
gray  levels  in  the  windows.  To  determine  this  I 
used  the  fact  that  when  the  range  of  values  in  a 
window  is  larger  than  a  certain  value  I  can  prove 
that  the  probability  it  contains  a  boundary  is 
above  90%.  When  the  range  is  smaller  than 
another  constant  the.i  I  can  prove  die  probability 
it  contairs  a  boundary  is  smaller  than  10%.  Only 
10%  of  the  windows  in  die  region  have  ranges  that 
fall  between  these  two  constants. 

(4)  Finding  a  Suncient  Statistic: 

finding  an  easy  to  calculate  function  or  functions 
of  the  neighborhood  whose  output  uniquely 
determines  the  likelihood.  This  technique  actually 
loses  no  accuracy  and  can  be  a  great  computa¬ 
tional  help. 


If  the  domain  is  as  above  and  the  noise  is  known 
to  b;  gaussian  additive  of  known  standard  devia¬ 
tion  then  the  likelihood  of  no  boundary'  can  be 

♦ 

determined  from  the  mean  and  standard  deviation 
of  the  window  and  the  likelihood  of  a  boundary 
can  be  determined  from  the  means  and  standard 
deviations  of  the  pairwise  partitions  of  the  win¬ 
dows. 

1 1.3.  Analyzing  Established  Feature  Detectors 

Another  path  to  building  Bayesian  feature  detec¬ 
tors  is  to  change  established  feature  detection 
algorithms  into  Bayes  an  feature  detectors.  Many  esta- 
bliched  feature  detectors  are  computationally  efficient 
and  extensively  analyzed.  To  change  a  feature  detector 
into  a  Bayesian  feature  detector  it  is  necessary  to 
extract  from  it  the  domain  it  is  based  on.  Then  the 
reliability  of  the  detector  can  be  evaluated  and  its  out¬ 
put  combined  with  the  output  of  other  feature  detec¬ 
tors. 

One  way  to  extract  the  domain  from  a  feature 
aetcr'or  is  to  consider  the  relationship  between  the 
output  of  the  leaiur  detector  and  the  probability  dis¬ 
tribution  for  the  feature  apace.  feature  detectors 

return  output  along  a  linear  scale  that  is  supposec*  to 
correspond  with  probabilities.  I  base  my  technique  for 
domain  extraction  on  a  rigorous  version  of  this 
correspondence. 

I  define  a  feature  detector  and  domain  to  be  con¬ 
sistent  when  according  to  the  domain's  assumptions  the 
feature  detector  is  computing  a  function  that  is  mono¬ 
tonic  with  the  function  that  is  the  probability  distribu¬ 
tion  for  the  feature  space  according  to  the  domain.  To 
build  a  domain  consistent  with  an  established  feature 
detector  I  make  some  assumptions  (to  simplify  the 
problem)  and  then  attempt  to  show  mathematically 
what  other  assumptions  are  necessary  to  assure  that  the* 
domain  is  eonsistent  with  the  feature  detector. 


I  have  done  an  analysis  of  the  one-dinvensional 
gradient  by  finding  consi..tent  domains  for  it.  I  started 
by  considering  only  domains  with  an  assumption  that 
the  probability  of  a  boundary  can  be  determined  from 
only  two  points  since  the  gradient  oniy  uses  two 
points.  I  also  make  the  monochromaticity  and  noise 
assumptions  from  the  previous  section.  I  also  assume 
that  the  probability  of  the  gray  levels  is  constant.  I 
found  that  domains  that  make  these  assumptions  must 
have  symmetric,  ummodal,  and  additive  noise  to  be 
consistent  with  the  gradient  I  found  that  domains 
with  a  significant  occluding  noise  source  (where  the 
noise  value  replaces  the  true  value  lather  than  adding 
to  it)  were  not  consistent  with  the  gradient 

Consider  a  domain  with  gaulsian  additive  noise  (a 
symmetric,  ummodal  and  additive  noise).  When  there 
is  a  boundary  between  them  the  two  pixels  in  the  win¬ 
dow  are  of  independent  objects.  The  probability  of 
seeing  that  window  is  the  product  of  the  probabilities 
of  seeing  the  gray-levels  of  each  of  the  two  pixels.  The 
probability  of  seeing  any  gray-Lvel  at  a  pixel  is  con¬ 
stant  So  the  product  of  the  probabilities  of  the  two 
gray-levels  seen  is  that  constant  squared  whicn  shows 
/.,( IK/)| H)  is  a  constant 

I ,OV(f)\not R  )  can  be  calculated  from  tabic 
iookuo  on  the  gradient.  Thus  an  efficient  Bayesian 
feature  detector  ran  be  derived  from  the*  I  dimensional 
gradient. 

Figures  5,  6  and  7  illustrate  the  weakness  of  the 
gradient  given  an  occluding  noise  source  versus  an 
additive  one. 


'  .jure 

Graph  of  i 


figure  5;  boundary  s  t-c  C  at 

Graph  of  (hi  huhili'o  'if  bore  Inti  "g  '  ise 

h- hi ri v!  ir 1  d  ( ii.iuient  (mean  !2s  .•hit  hi  J>f»)  stdev  ''I 

),a  (  i/fisH  hi  \dditive  N  use  fohuwed  by  Gaussian  Additive  Noise 

Istdc;  >  ■  ill  o|  25(v)  (sides  4) 


Intensity  is  piopoitional  t< >  probability  ol  a 
boundary  ( W (me  means  bound jry  black  means 
no  Houndai;.)  (-rom  left  to  tight  i ,  successively 
higher  gradients.  from  bottom  io  lop  is  suc¬ 
cessively  brighter  windows. 


l  igi.ie  (i 

s(  i aph  oi  IVohahiliu  i ,f 
©••undaty  vs  |-d  l  o. idler.! 
i  Occluding  Soi'e 
1  'llo.  n  !  'S'  (on!  .  I  2  Mi  I  side  V  S  ) 

! :,;eil.s:l>  is  pioporlion.il  lo  pi, O’  .  of  i 
!'■  1 1 :i -.io : >  iWliMe  «v  ms  b.  unduiy  HU  iv  me  ins 
i'-  hi 1  ih1.ii  1  I  I  rom  iell  ■  "uhl  is  Micccssiv . 1 , 
!>■  he!  .truil'crus  from  ;;on;  to  :.  p  is  suc¬ 
cessive!  .  brighter  w indi  vv s. 


Intensity  is  proportional  to,  ;  o  Puhihiv  of  a 
boundary  (White  means  bound  black  means 
no  boundary)  from  left  to  rg>  ,s  successively 
higher  gradients.  f-iorn  boitom  lo  (op  is  suc¬ 
cessively  brighter  windows. 

In  these  figures  I  giaph  the  probability  of  a  Iv-.ndary 
passing  through  a  two  pixel  window,  against  the  gla¬ 
dioli  and  tfifc  minimum  gray-level  of  the  pc  .  \c 
window  the  gradient  is  calculated  irov  Hy 
graphs  the  probability  of  i  boundary  a  nine  gaus- 
sian  additive  noise.  No  .  that  the  p'  'bilnlity  ot  a 
boundary  depends  only  on  'he  gradient  and  not  on 
minimum  intensity  in  the  window,  figure  h  sho«s  the 
same  graph  assuming  gist  occluding  noise.  Here,  lor 
nonzero  gradient  the  onlv  v.analion  in  prohabilitv  is 
because  of  the  differing  minimum  intensities  in  images 
with  windows  ci.  aaimng  piscls  likely  to  he  noise  hav¬ 
ing  a  lower  probability  o'  containing  a  bounda  y.  fig¬ 
ure  7  shows  the  same  graph  with  a  more  realistic 
assumption  o|  occ ludmg  noise  Pillowed  by  gauss, an 
additive  noise 


12.  End  of  Summary 

I  have  just  summarized  how  a  low  level  system  for 
image  interpretation  can  be  made  from  human 
knowledge  in  a  flexible  way  that  does  not  ignore 
significant  portions  of  either  the  a  priori  known  data  or 
the  observed  image.  I  have  summarized  the  result  of 
testing  some  of  these  techniques  using  simple  domains. 
Research  on  taking  into  account  more  varied  soures 
of  information  and  finding  realistic  priors  is  underway. 
Also  the  techniques  for  building  domains  and  analyz¬ 
ing  operators  are  to  be  applied  to  more  sophisticated 
models  and  operators. 

References 


1.  J.  Aloimonos  and  P.  Chou,  Detection  of  Surface 
Orientation  and  Motion  from  Texture:  1.  The 
Case  of  Planes,  161,  Computer  Science 
Department.  University  of  Rochester,  January 
1985. 

2.  H.  C.  Andrews  and  B.  R.  Hunt,  Digital  Image 

Restoration,  PRENTiCE-HALL,  INC., 

Englewood  ClifTs,  New  lersey  07632,  1977. 

3.  H.  C.  Andrews  and  B.  R.  Hunt,  Digital  image 

Restoration,  PRENTICE-HALL,  INC., 

Englewood  Cliffs,  New  Jersey  07632.  1977. 

4.  D.  H.  Ballard  and  C.  M.  Brown,  in  Computer 
Vision,  Prentice-Hall  Inc.,  Englewood  Cliffs.  New 
Jersey.  1982.  102  105. 

5.  D.  K.  Ballard  and  O.  A.  Kimball.  Rigid  Body 
Motion  from  Depth  and  Optical  Flow,  70, 
Department  of  Computer  Science.  University  of 
Rochester,  November  1981  VIS'ON. 

6.  R.  Chellappa.  Fitting  Markov  Random  Field 
Models  to  Images,  994,  University  of  Maryland, 
Computer  Vision  l  aboratory.  Computer  Science 
Center.  January  1981. 


7.  R.  Chellappa,  Digital  Image  Restoration  using 
Conditional  Markov  Models,  1027,  University  of 
Maryland,  computer  Vision  Laboratory, 
Computer  Science  Center,  Match  1981. 

8.  H.  Derm,  H.  Elliott  R.  Cristi  end  D.  Geman, 
Bayses  Smoothing  Algorithms  for  Segmentation 
of  Binary  Images  Modeled  by  Markov  Random 
Fields,  PAMI  6,  6  (Novermber  1984),  707-720, 
IEEE. 

9.  D.  J.  F'leet  The  Early  Processing  of  Spatio  - 
Temporal  Visual  Infoimation,  84-7,  University  of 
Toronto.  Research  in  Biological  and 
Computational  Vision,  September  1984. 

10.  S.  Geman  and  D.  Geman,  Stochastic  Relaxation, 
Gibbs  D'Stribut’ons,  and  the  Bayetian 
Restoration  of  Images.  PAMI  6,  6  (November 
1984).  721-741,  IEEE. 

11.  A.  R.  Hanson  and  E.  M.  Riseman,  VISIONS:  A 
Computer  System  for  Interpretting  Scenes,  in 
Computer  Vision  Systems.  A.  R.  Hanson  and  E. 
M.  Riseman  (ed.),  Academic  Press,  London, 
1978,  303-334. 

12.  B.  K.  P.  Horn,  Shape  from  Shading:  A  Method 
for  Finding  the  SHape  of  a  Smooth  Opaque 
Object  from  One  View,  Massachuse'ss  Institute  of 
Technology  Department  of  ELectrical 
Engineering,  August  1970. 

13.  K.  Ikeuchi,  Shape  form  Regular  Patterns  (an 
Example  of  Constraint  Propagation  in  Vision), 
567,  Massachusetts  Institute  of  Technology. 
Artificial  Intelligence  Laboratory,  March  1980. 

14.  T.  K.  J.  R.  Render,  Mapping  Image  Properties 
into  Shading  Constraints:  Skewed  Symetry, 
Affine  -  Transformable  Patterns,  and  the  Shape 
from  Texture  Paradigm.  133,  Carnegie  Mellon 
University  Computer  Science  Department,  July 
1980. 


269 


15.  G.  Reynolds.  N.  Irwin.  A  R.  Hanson  and  E.  M. 
Rlseman,  Hierarchical  Knowledge-Directed 
Object  Extraction  Using  a  Combined  Region  and 
Line  Representation,  Proceedings:  Image 
Understanding  Workshop ,  ,  October  1984,  165- 
168. 

16.  D.  Terzopoulos,  Multilevel  Computational 
Processes  for  Visual  Surface  Reconstruction, 
Computer  Vision  Graphics,  and  [mage  Processing 
24.  1  (October  1983).  52-%.  Academic  Press. 


► 


k 


GfQW€>-  3Z 


LOCATING  CULTURAL  REGIONS  IN  AERIAL  IMAGERY  USING 

GEOMETRIC  CUES 


Pascal  tua  and  Andrew  J.  Hanson 

Artificial  Intelligence  Center 
SRI  International,  Menlo  Park,  California 


ABSTRACT 

To  locate  cultural  regions  in  aerial  imagery,  we  merge  pixel- 
level  techniques  with  geometric  reasoning  and  generic  (as 
opposed  to  specific  or  template-like)  object  descriptions. 
We  utilize  discrepancies  between  the  generic  models  and 
the  image  data  to  refine  an  initial  low-level  segmentation 
and  produce  a  more  accurate  delineation  of  cultural  re¬ 
gions. 

1  Introduction 

Detecting  and  labeling  scene  objects  is  one  of  the  more 
demanding  tasks  in  automated  ima^e  analysis.  In  the  typ¬ 
ical  case  of  a  high-altitude  aerial  image,  there  are  no  ex¬ 
isting  segmentation  techniques  that  can  reliably  produce 
regions  that  have  a  one-to-one  correspondence  with  ob¬ 
jects  of  interest.  Most  segmentation  procedures  produce 
a  wide  mixture  of  undersegmented  objects,  where  the  ob¬ 
ject  is  merged  with  other  data,  and  cversegrnented  objects, 
where  the  object  is  broken  up  into  a  “jigsaw  puzzle"  of  in¬ 
distinct  parts.  Furthermore,  such  segmentations  are  nor¬ 
mally  unstable  with  respect  to  minor  changes  in  the  pro¬ 
gram  parameters,  digitization  methods,  viewpoint,  scene 
lighting,  and  film-processing  methods. 

We  therefore  propose  to  explore  the  application  of 
know'edge-based  methods  to  the  problem  of  correcting  ar 
initial  segmentation  so  it  coincides  with  recognizable  ob¬ 
jects.  Other  related  efforts  include  those  of  Ohta  tt  al 
!l979j,  Nagao  et  al  [1980],  Reynolds  et  al  [1984],  Nazif 
and  Levine  [1984],  McKeown  et  al  [1984],  and  Hwang  et 
al  [1985].  Our  work  relies  upon  contextual  geometric  rea¬ 
soning  and  generic,  template-free  models  of  the  features  to 
be  extracted  from  the  image.  We  overcome  some  of  the 
limitations  of  previous  approaches  by  providing  powerful 
facilities  for  utilizing  generi'  shines  and  spatial  context  to 
resolve  undersegmented  oujects. 


The  work  reported  here  w«  supported  by  the  Defence  Ad  anted  Re¬ 
search  Projects  Agency  under  Contract  MDAS03-83-C-0027  and  by 
the.  U.S.  Army  Engineer  Topographic  Laboratories  under  Contract 
DACA72-8S-C-0008. 


For  the  purposes  of  our  current  work,  we  have  imposed 
the  follow,  ng  constraints: 

•  Object,  type:  We  restrict  ourselves  to  the  identifica¬ 
tion  of  cultural  structures  in  aerial  imagery,  thereby 
providing  the  opportunity  to  use  such  observations 
as  the  presence  of  straight  lines  to  focus  attention 
on  regions  likely  to  be  components  of  a  target  object 
[see,  e.g.,  Shirai,  1978] . 

•  Image  data:  We  assume  that  we  are  givtn  a  digi¬ 
tized  aerial  im  ige  that  is  essentially  a  straight-down 
view,  along  with  lighting  camera-model  param¬ 
eters.  Typical  images  used  in  our  experiments  have 
(calcs  of  I  to  2  feet  per  pixel  on  the  ground. 

•  Initial  segmentation:  We  assume  we  are  provided 
with  a  syntactic  partition  of  the  image  computed  by 
an  Ghlander-styie  segmenter  [Ohlander  et  al,  1978; 
see  also  Laws,  1982,  1984). 

•  Knowledge  characteristics:  We  assume  that  no 
precise  templates  of  the  target  cultural  objects  are 
available,  and  thus  we  are  required  to  deal  with  com¬ 
plex  objects  having  only  general,  semantic  descrip¬ 
tors. 

Our  results  to  date  may  be  summarized  as  follows: 

•  Tinder-segmented  Regions  Are  Correctly  Re¬ 
fined.  The  identification  of  cultural  portions  of  a 
region  on  the  basis  of  groups  of  parallel  and  per¬ 
pendicular  lines  leads  to  a  very  reliable  spli?ting  of 
undersegmented  regions  when  combined  with  other 
contextual  knowledge. 

•  Templates  Are  Eliminated.  Many  traditional 
systems  for  discovering  buildings  use  relatively  rigid 
rectangular  templates,  possibly  with  an  allowable 
range  of  constraints  on  dimensions  [e.g.,  Binford, 
1982;  Hwang  et  al,  1985].  Instead,  we  employ  generic 
knowledge  of  the  object  geometry.  By  generalizing 
the  concept  of  a  “side"  to  include  a  large  class  of 
rectilinear  zig-zag  shapes  and  searching  for  rectan¬ 
gular  geometric  relationships  among  these  compos¬ 


ed 

r 

u 


k 


271 


f  1 


ite  shapes,  we  can  accept  and  identify  very  om- 
plex  polygonal  structures  with  rectilinear  compo¬ 
nents.  No  assumptions  whatsoever  are  made  about 
specific  shapes,  and  thus  we  avoid  the  restrictions 
of  the  template  approach  while  gaining  substantial 
power. 

•  Semantic  Knowledge  Supports  Correction  and 
Labeling  of  the  Initial  Segmentation.  We  have 
linked  domain  knowledge  with  image-level  operations 
in  several  ways  to  improve  overall  system  behavior. 
We  utilize  knowledge  of  how  the  segmerter  is  likely 
to  misplace  region  boundaries  relative  to  desirable 
edges  to  recover  such  edges  in  the  resegmentat.on, 
as  well  as  to  reject  improbable  geometries.  Predict¬ 
ing  the  way  shadows  may  be  separated  or  incorrectly 
merged  in  the  original  segmentation  leads  to  the  cor¬ 
rect  parsing  of  shadow  evidence  required  for  identi¬ 
fication  of  raised  structures. 

It  the  succeeding  sections,  we  first  describe  our  general 
approach  to  the  design  of  an  object-recognition  system, 
and  then  present  some  initial  results.  We  conclude  with 
our  plans  for  future  refinement  of  the  system 

2  Approach  to  the  Object  Recog¬ 
nition  Problem 

Several  observations  and  theoretical  concepts  form  the  ba¬ 
sis  for  our  approach  to  the  object  recognition  problem. 

Recursive  segmentation  guarantees  strong 
derivatives.  An  Ohlander-style  segmentation  of  an  image 
is  recursive.  A  set  of  pixels  in  a  given  value  range  is  selected 
on  the  basis  of  the  shape  of  a  frequeacy-of-oceu-rence  his¬ 
togram;  these  pixels  are  then  labeled  as  belonging  to  one 
of  several  regions  on  the  basis  of  spatial  contiguity.  Tbe 
histogram  of  a  region  derived  in  this  way  will  often  have 
a  shape  entirely  different  from  the  parent  histogram.  The 
procedure  is  applied  recursively  until  regions  with  no  sig¬ 
nificant  histogram  structure  are  obtained. 

Neighboring  regions  thus  will  often  belong  to  noncon¬ 
tiguous  value  ranges  of  the  histogram;  the  deeper  the  levei 
of  recursion,  the  more  likely  it  is  to  find  regions  widely 
separated  from  their  neighbors  with  respect  to  the  range 
of  pixel  values  in  their  histogram.*.  Region  boundaries  tend 
to  lie  on  discontinuities,  in  the  pixel  values  and,  therefore, 
strong  derivatives  oeevr  between  regions 

In  Figure  1,  we  verify  these  observations  for  a  grey-scale 
image  by  showing  the  qualitative  correspondence  between 
segmentation  region  boundaries  and  the  pixels  in  the  image 
with  high  Sobel  derivative  strengths. 

Sobel  directions  align  with  region  boundaries. 
Edge  direction  can  be  determined  in  two  ways.  One  is 
to  fit  a  line  to  a  set  of  points  in  an  edge  sequence,  and 


the  other  is  to  compute  the  Sobel  direction  at  a  point. 
Because  of  the  high  correlation  between  Sobel  derivatives 
and  region  boundaries  shown  in  Figure  1,  the  latter  wi!'. 
be  quite  reliable  (see  also  Burns  et  al,  1984,  for  another 
approach). 


0>)  M 


Figure  1.  (a)  An  example  of  an  aerial  image  containing 
houses,  (b)  The  boundaries  of  the  regions  re¬ 
sulting  from  a  segmentation  of  the  image,  (c)  A 
binary  image  showing  those  pixels  with  strong 
magnitudes  of  the  Sobel  derivative. 

In  Figure  2,  we  show  a  typical  region  boundary  ob¬ 
tained  from  the  SRI  SLICE  segmenter  [Laws,  1984],  to¬ 
gether  with  the  long,  straight  lines  obtained  by  an  algo¬ 
rithm  that  looks  only  for  consistency  in  the  Sobel  direc¬ 
tions  of  a  contiguous  set  of  boundary  points.  The  sets  of 
points  with  compatible  Sobel  directions  and  the  apparent 
linear  boundary  pieces  are  in  good  agreement. 

Lines  are  classified  by  geometric  direction.  Se¬ 
mantically  significant  clusters  of  lines  are  often  collinear, 
but  laterally  displaced.  The  direction  that  we  assign  to 
a  cluster  of  two  or  more  collinear  os  parallel  lines  is  a 


Figure  2:  (a)  TypiutJ  region  boundary  taken  from  the 
bottom  center  of  Figure  1.  (b)  Long,  straight 
lines  in  t  lie  region  boundary  derived  only  from 
requhr.ig  consistency  of  the  Sobol  directions  in 
sets  of  contiguous  points. 


weighted  average  of  the  directions  of  each  individual  line, 
rather  than  the  direction  produced  by  fitting  a  line  to  the 
complete  collection  of  points.  This  distinction  is  illustrated 
in  Figure  3. 


Figure  Z:  (a}  The  result  of  fitting  a  line  to  all  the  points 
in  a  pair  of  parallel,  offset  lines.  The  resulting 
direction  is  incorrect  rjr  the  purposes  of  this 
work,  (b)  The  compi  rite  direction  of  two  lines 
computed  from  a  weighted  average  of  the  direc¬ 
tion  of  each  line. 


Shadows  may  be  separated  efficiently.  Shad¬ 
ows  form  high-contrast  regions  with  predictable  geomet¬ 
ric  shape  characteristics  [see,  e.g.,  Shafer,  1985;  Modioni, 
19S3).  Our  line-extraction  methods  are  especially  appro¬ 
priate  for  extracting  shadows  that  may  have  several  broken 
segments  aligned  with  the  sun  azimuthal  angle. 

Backtracking  mechanisms  are  supported.  Back¬ 
tracking  is  accomplished  in  the  current  system  using  a  li¬ 
brary  of  reversible,  rule-like  procedures.  An  example  of 
such  a  backtracking  operation  is  shown  in  Figure  4;  a  com¬ 
posite  line  can  be  broken  when  a  rule  gives  preference  to 
the  construction  of  a  more  complex  structure,  such  as  a 
U-shape. 

We  have  previously  expressed  portions  of  our  system 
in  the  framework  of  MRS  [Geneserith  et  al,  1993]  in  an  at¬ 
tempt  to  utilize  the  backtracking  facilities  provided  in  such 
a  reasoning  environment  in  the  current  implementation, 
we  have  chosen  for  practical  reasons  to  revert  to  proce¬ 
dural  rule  representation.  Perhaps  when  a  more  complete 
understanding  of  this  problem  domain  is  achieved,  we  shall 
translate  some  of  our  procedurally  represented  rules  into  a 
more  succinct  declarative  representation. 


Figure  4:  Backtracking  by  breaking  a  composite  line  to 
form  a  U-shaped  structure.  The  U-shape  is  pre¬ 
ferred  because  it  provides  strong  evidence  for  a 
cultural  object. 


Geometric  structure  localizes  semantically  sig¬ 
nificant  subregions.  The  current  system  relies  upon  gen¬ 
eral  relationships  such  as  perpendicularity  and  parallelness 
of  composite  line  structures  to  single  out  portions  of  an 
arbitrarily  shaped  region  that  have  suggestive  polygonal 
substructures.  This  information  is  then  used  to  correct 
the  original  segmentation 

We  exi  r ,i f  i  r.nd  use  relationships  such  as  in  front  of,  be¬ 
hind,  between,  beside,  enclosed  by,  enclosing,  at  a  certain 
angle  from,  -u.d  at  a  certain  distance  from  in  both  geomet¬ 
ric  and  contextual  masoning  processes.  This  vocabulary 
provides  -  ,  as  is  for  semantic  reasoning,  e.g.,  “look  for  dark 
areas  in  the  direction  of  the  solar  azimuthal  angle  relative 
to  a  region  Douniary  in  order  to  confirm  the  hypothesis  of 
a  building  wall." 


O 


■rfc- 


Once  interesting  region  p  ortiona  ere  selected,  a  pixel- 
based  line-linking  procedure  can  be  invoked  to  connect  re¬ 
lated  lines,  complete  corners,  and  close  open-ended  Paral¬ 
lels  or  U’s.  When  the  resulting  links  are  satisfactory,  the 
undesirable  portions  of  the  region  are  amputated,  leaving 
clean  cultural  structures  as  the  residue.  Figure  5  illustrates 
linking  processes  that  would  be  carried  out  when  signifi- 
cantolinear  structures  are  present  in  an  undersegmented 
region. 


(a) 


(b) 


Figure  5:  (a)  Resegmenting  a  region  with  a  good  U  by 
completing  a  comer,  (b)  Resegmenting  a  region 
with  a  good  Parallel  by  linking  the  elements 
of  a  composite,  line. 

3  Examples  and  Results 

The  current  implementation  of  the  system  consists  of  two 
main  sequences  of  operations: 


The  final  result  of  the  computation  is  a  resegmenta¬ 
tion  of  the  image  with  explicitly  identified  cultural-region 
clusters.  Below,  we  present  three  examples  illustrating  the 
genera)  features  of  the  approach. 

3.1  Example  1:  An  eauv  region. 

In  the  lower  right-hand  comer  of  the  aerial  image  in  Fig¬ 
ure  la  there  is  a  house  whose  outline  corresponds  exactly 
to  one  of  the  regions  produced  by  the  segmentation.  The 
good  lines  found  in  thr  region  boundary  are  shown  in  Fig¬ 
ure  6.  This  house  is  characterized  by  the  two  sets  of  parallel 
lines  that  close  to  form  a  Bax;  an  appropriately  located 
shadow  is  also  present. 


Figure  6:  The  long,  straight  lines  belonging  to  a  distinct 
house  region.  These  lines  form  a  Bax  structure, 
indicating  very  strong  evidence  for  a  cultural 
object  and  distinguishing  the  region  from  its 
surroundings. 


•  Discovering  the  geometric  features  and  relationships 
within  each  single  region. 

•  Resegmenting  some  regions  based  upon  geometric  re¬ 
lationships  within  a  region  or  among  distinct  regions, 
and  grouping  interesting  regions  based  on  context 
knowledge. 

R (segmentation  is  currently  carried  out  u  ling  the  F *  al¬ 
gorithm  of  Fischler  et  al  [19811.  We  compute  the  required 
cost  array  bv  using  the  Sobel  edge  strength  combined  with 
geometric  constraints  on  the  direction:  in  which  edge  com¬ 
pletion  c  predicted  to  take  place.  As  a  result,  when  ths 
Sobel  strength  near  a  boundary  segment  follows  a  desir¬ 
able  path  different  from  the  boundary,  F*  will  pick  up 
that  path. 


Even  when  the  segmentation  of  an  image  is  effectively 
perfect,  locating  the  cultural  correspondences  can  be  nor.- 
trivial.  Our  method  immediately  focusses  on  this  structure 
without  a  priori  knowledge  of  its  shape  and  singles  it  out 
because  of  its  exceptional  geometric  structures. 

In  this  esse,  no  resegmentation  is  performed  b  'cause 
there  is  no  significant  difference  between  the  paths  i  -und 
by  linking  the  lines  and  the  region  boundary  itself.  The 
result  is  a  single,  identifiable  house-region,  as  shown  in 
Figure  7. 


t  — . 

t 


*»'.**' 


Figure  7:  The  single  identified  house  region  boundary 
overlaid  on  the  image.  No  resegmentation  was 
necessary  in  this  ideal  case. 


3.2  Example  2:  Repartitioning  a  com¬ 
plex  region 

The  next  example,  with  the  image,  region  boundaries, 
and  elementary  line  segments  shown  in  Figure  8,  contains 
a  heavily  shadowed,  approximately  L-shaped,  composite 
building.  The  segmentation  confuses  complex  porches  with 
roof  tops,  inappropriately  combine.!  sidewalks  with  the 
roof,  and  merges  a  significant  shaded  roof  portion  with 
background  vegetation.  The  sunlit  portions  of  the  com¬ 
posite  roof  are  ci  stained  in  a  single  region;  the  two  main 
lobes  of  this  region  are  joined  by  a  narrow  neck.  We  ob¬ 
serve  that,  given  only  the  good  edges  of  this  roof  region 
as  shown  in  Figure  8c,  the  roof  structure  is  confusing  to 
parse  even  for  a  human. 

We  first  search  for  basic  geometric  relationships  within 
the  roof-containing  region.  Two  distinct  U’s  are  found 
that  support  the  identification  of  a  culture!  object,  one 
in  each  lobe  of  the  region.  Both  of  these  U’s  require  the 
breaking  of  a  composite  T,  a  type  of  backtracking,  for  their 
construction. 

Next,  the  system  attempts  to  link  composite  lines  and 
to  close  the  open  ends  of  the  U’s  to  form  boxes  using 
the  line-linking  algorithm.  This  procedure  amputates  the 
porch  and  sidewalk  appendages  and  leaves  two  Baxes, 
outlined  in  Figure  9,  that  provide  a  clear  semantic  con¬ 
text.  Applying  knowledge  of  shadows  here  generates  the 
hypothesis  that  both  boxes  are  associated  with  the  same 
large  shadow  region,  so  we  label  the  group  as  a  composite 
3-dimensional  structure. 


w 


Figure  S:  (a)  Another  image  containing  a  complex  house 
structure,  (bj  Region  boundares.  The  up¬ 
per  L-shaped  structure  is  a  shadow;  the  lower 
L-shaped  structure  arises  from  two  juxtaposed 
pieces  of  sunlit  roof  joined  into  a  single  region 
by  a  narrow  neck,  (c)  Long,  straight  lines  in 
the  boundary  of  the  sunlit  roof  region. 

We  have  thus  succeeded  in  taking  a  single,  confusing 
region  and  using  its  geometric  structure  to  break  it  up  into 
manageable  parts.  We  note  that  the  area  enclosed  between 
the  paii  of  Bax  structures  and  the  shadow  is  a  heavily 
shaded,  peaked-roof  portion  whose  region  features  are  so 
poor  that  it  could  not  have  been  recognized  by  ouf  basic 
methods;  the  labeled  enclosing  regions  now  provide  the 
required  semantic  context  to  support  this  identification. 

2.3  Example  3:  Multiple  region  cluster¬ 
ing 

In  Figure  10,  we  show  another  portion  of  the  image  of 
Figure  la  and  its  segmentation.  This  image  is  typical  of 
cultural  scenes  that  are  difficult  to  parse  using  pattem- 


•  (c) 


275 


Figure  9:  Final  results  of  the  splitting.  The  initial  seg¬ 
mentation  is  split  several  times  to  gi/e  two 
subregions  with  good  Bax  structures  whose 
boundaries  are  outlined  in  the  figure.  The  large 
shadow  region  is  recognized  as  common  to  both 
subregions.  The  area  between  the  shadow  and 
the  Bax  structures  is  now  identifiable  by  its 
semantic  context  as  a  heavily  shaded  roof  por¬ 
tion. 

matching  techniques  because  the  terrain  and  roads  are 
highly  Irregular  and  the  houses  have  very  complex  shapes. 
Figures  11  and  12  show  typical  regions  resulting  from  the 
segmentation  of  a  house-containing  area,  along  with  illus¬ 
trations  of  the  process  by  which  geometric  "tructures  are 
discovered.  The  first  region  contains  an  excellent  U,  while 
the  second  has  a  Parallel. 

When  we  repeat  the  analysis  for  each  region  in  Fig¬ 
ure  10,  we  find  only  these  two  regions  that  have  suggestive 
structures  and  appear  to  be  geometrically  related.  Since 
an  appropriate  shadow  region  is  present,  we  deduce  that 
these  regions  probably  belong  to  a  single  cultural  cluster. 

The  geometric  relations  uuong  lines  in  the  boundaries 
of  these  regions  are  now  used  to  predict  the  locations  of  the 
^segmentation  boundaries  to  be  constructed  by  linking. 
The  results  of  the  linking  and  resegmentation  operations, 
depicted  in  Figure  la,  show  clearly  the  successful  extrac¬ 
tion  of  this  complex  building.  We  note  that  three  different 
types  of  repartitioning  were  carried  out  to  achieve  this: 

(1)  linking  a  corner  formed  by  two  lines  belonging  to  a  sin¬ 
gle  region,  thereby  splitting  off  an  irrelevant  appendage; 

(2)  'inking  a  corner  whose  linej  belong  to  two  separate  re¬ 
gions,  thereby  t  itting  yet  a  third  region  lying  between 
them  (this  completes  a  U  whose  sides  are  the  parallel  lines 
in  Figure  12);  and  (3)  closing  off  the  bottom  of  the  U’s 
formed  by  each  of  the  two  major  roof  segments. 


Figure  10:  Left  portion  of  the  image  of  Figure  la  with  seg¬ 
mentation  boundaries  from  Figure  lb  overlaid. 


/ 

/ 

s 

M  Co) 

A'" 

W  ‘At 

Figure  11:  An  illustration  of  the  procedure  by  which  geo¬ 
metrical  relations  are  constructed  within  a  sin¬ 
gle  region,  (a)  Boundary  of  a  typical  region  in¬ 
cluding  portions  of  a  house,  (b)  An  example  of 
a  composite  line  with  many  elementary  compo¬ 
nents  extracted  from  the  boundary,  (c)  A  pair 
of  parallel  lines  formed  within  the  boundary  by 
two  composite  lines,  (d)  Ti\e  U  stricture  con¬ 
structed  by  finding  a  line  in  tha  boundary  that 
closes  off  one  end  of  the  parallels.  In  this  ex¬ 
ample,  all  good  line  segments  belong  to  the  U. 


27  6 


(a)  (*») 

Figure  12:  (a)  Border  of  a  second  region  belonging  *o  the 
same  house,  (b)  These  parallel  lines  are  the 
best  structure  that  can  be  built.  The  Sobel  di¬ 
rections  of  the  short  left  edge  are  not  sufficiently 
consistent  to  allow  us  to  accept  it  as  a  closing 
line  for  a  O  structure. 

The  roof  segments  are  labeled  as  belonging  to  a  3- 
dimensional,  raised  structure  with  a  peaked  roof,  since 
they  correspond  to  a  “sunny  side’  and  a  “shady  side’  of 
the  roof,  with  a  narrow  shadow  adjacent  to  the  “shad/ 
side.” 


Figure  13:  The  results  of  computing  linking  lines  and  cut¬ 
ting  regions  accordingly.  A  third  region  comes 
into  play  when  the  linker  completes  the  right-  * 
hand  corner.  The  resulting  three  regions  con¬ 
tain  the  area  that  one  would  visually  associate 
with  a  house. 


r. 4 


U-SJ 


ICfi  ik  T  T  U*  A 


References 


We  plan  to  add  the  following  enhancements  to  the  current 
system  during  the  next  stage  of  development; 

•  Generate  interactive  explanations  of  various  actions 
to  facilitate  user  understanding  and  debugging  of  do¬ 
main  rules;  support  user  input  of  domain  knowledge 
and  corrections  of  the  labeling. 

•  Merge  “jigsaw  pussies*  of  objects  that  have  been 
badly  oversegmented. 

•  Extend  the  domains  of  expertise  to  include  “explain¬ 
able  anomalies ,’  of  which  the  current  shadow  analy¬ 
sis  is  one  example. 

•  Support  additional  classes  of  target  objects. 

«  Incorporate  additional  geometric  information  such 
as  perspective  distortion  of  target  shapes  present 
in  oblique  views  and  nonplanarity  of  the  underlying 

.  land. 

•  Support  exploitation  of  multiple  images  covering  the 
same  scene. 

The  investigation  described  here  explores  a  number  of 
promising  theoretical  directions  for  knowledge-based  par¬ 
titioning  and  object  identification,  and  produces  satisfying 
experimental  results  for  particular  classes  of  images.  Our 
next  task  will  be  to  extend  these  ideas  while  incorporating 
support  for  explanatory  interactions  with  the  user. 


T.O.  Binford,  “Survey  of  Model-Based  Image  Analysis 
Systems,*  The  International  Journal  of  Robotics  Re¬ 
search,  1,  No.  1,  pp.  18-04  (Spring,  1882/. 

J. B.  Lurna,  A.R.  Hanson,  and  E.M.  Rieeman,  “Extract¬ 

ing  Straight  Lines,*  Proceedings  of  the  Image  Un¬ 
derstanding  Workshop,  pp.  165-168  (1984). 

M.A.  Fischler,  J.M.  Tenenbaum,  and  H.C.  Wolf,  “De¬ 
tection  of  Roads  and  Linear  Structures  in  Low- 
Resolution  Aerial  Imagery  Using  a  Multisource 
Knowledge  Integration  Technique,"  Computer  Graph¬ 
ics  and  Image  Processing  IS,  pp.  201-223  (1981). 

M.  Geneeerith,  R.  Greiner,  and  D.E.  Smith,  “MRS  —  A 
Meta-Level  Reasoning  System,’  St.uiford  University 
Heuristic  Programming  Project  Report  No.  HPP-83- 
27  (1983). 

V.  Hwang,  L.  Davis,  and  T.  Matsuyama,  “Hypothesis  In¬ 
tegration  in  Image  Understanding  Systems,’  Univer¬ 
sity  of  Maryland  Report  CAR-TR-130  (1985). 

K. I.  Laws,  The  PHOFNIX  Image  Segmentation  Sy'tem: 

Deacription  and  Evaluation,  Technical  Note  289,  Ar¬ 
tificial  Intelligence  Center,  SRI  International,  Menlo 
Par',  California  (September  1982). 

K.I.  Laws,  Goal-Directed  Texture  Segmentation,  Technical 
Note  334,  Artificial  Intelligence  Center,  SRI  Interna¬ 
tional,  Menlo  Park,  Calif.  a  (September  1984). 


D.  McKeown  W.A.  Harvey,  and  J.  McDermott,  “Rule 
Based  Interpretation  of  Aerial  Imagery,"  Proceed¬ 
ings  of  IEEE  Workshop  on  Principles  of  Knowled/.e- 
Based  Systems,  pp.  146-157  (1984).  See  also  IEEE 
Trans.  PAMI,  in  press. 


G.G.  Medioni,  'Obtaining  3D  Information  from  Shadows 
in  Aerial  Images,"  Proceedings  of  IEEE  Confer¬ 
ence  on  Computer  Vision  and  Pattern  Recognition, 
pp.  73-76  (1983). 

M.  Nagao  and  T.  Matsuyama,  A  Structural  Analysis  of 
Complex  Aerial  Photographs,  Plenum  Press  (New 
York,  1980). 

A.M.  Naaif  and  M.D.  Levine,  “Low  Level  Image  Segmen¬ 
tation:  An  Expert  System,"  IEEE  Trans.  PAMI  6, 
pp.  S 55-577  (1984). 

R.  Ohlander,  K.  Price,  and  D.R.  Reddy,  “Picture  Segmen¬ 

tation  Using  a  Recursive  Region  Splitting  Method," 
Computer  Graphics  and  linage  Processing  8,  pp.  313- 
333  (1978). 

Y.  Ohta,  T.  Kanade,  and  T.  Sakai,  “A  Production  System 
for  Region  Analysis,"  Proc.  6th  Inter.  Joint  Conf. 
on  Artif.  Intell.,  pp.  684-686  (1979). 

G.  Reynolds,  N.  Irwin,  A.R.  Hanson,  and  E.M.  Riseman, 
“Hierarchical  Knowledge-Directed  Object  Extraction 
Using  a  Combined  Region  and  Line  Representation," 
Proceedings  of  the  Image  Understanding  Workshop, 
pp.  195-204  (1984). 

S. A.  Shafer,  Shadows  and  Silhouettes  in  Computer  Vision, 

Kluwer  Press  (1985). 

Y.  Shirai,  “Recognition  of  Man-Made  Objects  Using  Edge 
Cues,”  in  Computer  Vision  Systems,  Ed.  A.R.  Han¬ 
son  and  E.M.  Riseman,  Academic  Press  (New  York, 
1978). 


•..-•V-'.N 


© 

© 


273 


THE  INFORMATION  FUSION  PROBLEM  -jND  RULE-BASED  HYPOTHESES 
APPLIED  TO  COMPLEX  AGGREGATIONS  OP  IMAGE  EVENTS 


Robert  Belknap,  Edward  Rise  man,  and  Allen  Hanson 


Computer  and  information  Science  Department 
University  of  Massachusetts 
Amheist,  Massachusetts  01003 


ABSTRACT 

A  rule-based  system  fov  combining  information  from 
multiple  sources  of  sensory  data  is  described.  Relational 
rules,  integrating  data  from  the  output  of  multiple  low 
level  processes,  are  responsible  for  creating  complex  ag¬ 
gregations  of  the  data  in  order  to  obtain  object  hypotheses 
with  at  associated  confidence.  Relational  rules  are  defined 
between  primitive  elements  of  the  data  abstractions  so  that 
sets  of  elements  across  representations  can  be  selected  and 
grouped  on  the  basis  of  relational  scalar  measures.  The 
system  is  demonstrated  using  region  and  line  data  and  a 
set  of  relational  measures  defined  over  the  two  pixel-based 
represents''  .ns.  The  techniques  presented  are  extensions 
of  an  earlier  rule-based  system  operating  on  single  types  of 
data  abstractions  and  are  easily  extended  to  include  mo¬ 
tion,  stereo,  and  .  ange  data. 

I.  INTRODUCTION 

The' problem  of  integrating  information  from  multiple 
Ew-level  representations  is  just  beginning  to  be  investi¬ 
gated  in  computer  vision.  In  the  past  this  had  not  been 
a  problem,  since  most  systems  dealt  with  only  one  type 
of  low-level  or  segmentation  information  (usually  lines  or 
regions),  and  this  information  was  ext i acted  from  only  one 
type  of  sensory  input  data  (usually  intensity  or  RGB).  The 
*  field  of  computer  virion  has  matured  and  now  there  are 
many  applications  where  multiple  low-level  processes  pro¬ 
vide  partially  non-redundant  and  useful  information. 

The  prot’em  of  integrating  multiple  representations  of 
data  can  occur  even  when  there  is  only  one  source  of  sen- 
^“sory  data.  Different  algorithms  applied  to  the  same  data 
will  extract  different  feature  events.  For  example,  algo¬ 
rithms  i or  extracting  straight  lines  often  will  provide  in¬ 
formation  that  complements  a  region  segmentation.  Of 
coarse,  the  utility  of  multiple  representations  is  dependent 
upon  the  particular  set  of  algorithms  employed,  the  task 
domain,  and  tie  goals. 

This  work  has  been  supported  by  the  Defease  Advanced  Research 
Projects  Agency  (DOD),  ARFA  Order  No.  NOOOI4-M-K-C464, 
and  by  the  Air  Force  Office  of  Scientific  Research  under  contract 
F49620-S3-O0099. 


In  addition  to  different  types  of  information  extracted 
from  the  same  data,  the  use  of  multiple  sources  of  data 
is  becoming  more  common.  Depth  maps  are  being  ob¬ 
tained  directly  from  laser  range  data  and  indirectly  from 
motion  and  stereo  algorithms  applied  to  pairs  or  sequences 
of  images.  These  depth  arrays  are  sources  of  additional 
intermediate  representations,  such  as  surfaces  or  three- 
dimensional  lines  at  discontinuities  of  surface  orientation. 
Certainly  different  information  will  be  obtained  from  in¬ 
tensity  arrays  and  depth  arrays,  and  therefore  representa¬ 
tions  derived  from  depth  map*  provide  information  which 
is  important  to  integrate.  It  has  become  obviocs  that  each 
low-level  process  usually  extracts  only  partial  information 
with  a  great  deal  of  redundancy  in  these  outputs;  thus, 
maximum  reliability  can  only  be  achieved  through  pro¬ 
cesses  that  can  integrate  information  in  a  flexible  manner. 

This  paper  describes  one  aspect  of  thie  problem,  that  of 
integrating  information  from  multiple  low-Jevei  processes 
applied  to  multiple  sources  of  sensory  data  to  obtain  a  sin¬ 
gle  object  hypothesis  with  an  associated  confidence.  The 
approach  described  in  this  paper  is  illustrated  via  regions 
v.a  lines  extracted  from  the  same  RGB  data.  The  tech¬ 
niques  present'd,  however,  could  easily  be  extended  to  in¬ 
clude  motion,  stereo  and  range  data  which  yield  an  inter¬ 
mediate  representation  of  surfaces.  The  approach  also  will 
naturally  apply  to  motion  attributes  of  lines,  regions,  sur¬ 
faces,  and  volumes  which  have  2D  or  3D  motion  attributes. 

H.  BACKGROUND 

n.l.  THE  INTERMEDIATE  SYMBOLIC 

REPRESENTATION 

It  is  generally  accepted  that  a  computer  vision  system 
must  perform  a  variety  of  transformations  of  the  data  dur¬ 
ing  the  interpretation  process.  One  of  the  key  abstractions 
is  the  transformation  of  pixels,  or  more  generally  arrays  of 
sensory  data,  into  image  events  which  can  be  named  and 
referred  to  by  their  properties.  We  refer  to  the  resultant 
representation  as  ‘the  intermediate  symbolic  representa¬ 
tion*.  This  representation  serves  as  the  communication 
interface  between  low  and  high  level  processes  in  the  VI¬ 
SIONS  system  |HAN78a,  RIS8*|. 

At  a  coarse  conceptual  level,  the  VISIONS  system  is 
organised  into  three  levels  of  processing:  low.  interme- 


diate,  and  high.  Currently,  the  low-level,  or  oegmenia- 
tion,  processes  output  a  symbolic  representation  of  the 
data  in  the  form  of  regions  and  lines.  Attributes,  such 
as  color,  texture,  location,  sise,  shape,  and  orientation, 
are  then  calculated  for  each  region  or  line.  All  of  this 
information  is  then  organised  into  an  intermediate  data 
structure  [SOU85]  which  allows  flexible  vr-  -.8  to  attributes 
of,  and  relations  between,  the  intermediate  *  in  colic  enti¬ 
ties.  The  intermediate-level  processes  organise  the  low- 
level  output  and  produce  initial  hypotheses  (for  the  high- 
level  processes  which  interpret  the  scene)  through  the  use 
of  domain  knowledge  and  provide  control  information  to 
modulate  the  lower  level  processes. 

In  general,  the  only  requirement  for  placing  another 
type  of  low-level  entity  in  the  intermediate  representation 
is  that  each  primitive  element  of  that  data  type  must  have 
a  symbolic  name  (e.g.,  region-240,  surface-38,  corner-46) 
and  a  non-empty  set  of  attribute-value  pairr.  It  is  the  val¬ 
ues  of  the  attributes  that  provide  the  basis  for  initial  inter¬ 
pretation  processes.  At  present  the  intermediate  database 
consists  of  regions  and  liues. 

In  the  next  two  subsections  we  will  very  briefly  out¬ 
line  the  low-level  segmentation  algorithms,  and  then  the 
current  version  of  the  rule-based  system  for  generating  hy¬ 
potheses  from  single  types  of  intermediate  elements.  Suc- 
cec  -ling  sections  extend  this  system  to  operate  over  multi¬ 
ple  types  of  intermediate  elements. 

H.2.  SEGMENTATION  ALGORITHMS 

We  have  developed  our  region  and  edge  representation 
based  on  two  low-level  algorithms:  the  Nagin-Kohler  re¬ 
gion  segmentation  algorithm  (NAG79,  NAG82,  KOH83, 
KOE'  i  and  the  Burns  linear  feature  extraction  algorithm 
[BUk  The  Nagin-Kohier  algorithm  involves  the  detec¬ 
tion  of  clusters  in  a  feature  histogram,  associating  labels 
with  the  clusters,  mapping  the  labels  onto  the  image  pix¬ 
els,  and  then  forming  regions  of  connected  pixels  with  the 
same  label.  In  general,  the  process  of  global  histogram 
labeling  causes  many  errors  to  occur,  hence  an  add.tional 
key  aspect  of  this  algorithm  is  the  localisation  of  the  his- 
togramming  and  peak  selection  to  subimages.  Regions  are 
formed  from  the  local  histogram  labels  and  then  the  results 
in  subimages  are  integrated  via  region  merging  across  the 
artificial  subimage  boundaries. 

The  Bums  algorithm  is  directed  toward  the  extrac¬ 
tion  of  linear  features  in  Intensity  images,  including  low- 
contrast  linear  features.  It  provides  a  low-level  repre-en- 
tation  of  intensity  variations  by  segmenting  the  intensity 
array  into  connected  subsets  of  pixels  which  have  similar 
gradient  orientation;  these  pixels  act  as  “edge-support  re¬ 
gions*  of  a  linear  feature,  and  various  characteristics  of  the 
associated  line  or  edge  can  be  extracted  from  them.  Thus, 
both  regions  and  lines  have  a  common  pixel  based  repre¬ 
sentation  which  is  a  significant  advantage  for  information 
integration. 

H.3.  OBJECT  BYPOTHESFS  VIA  RULES  ON 

ATTRIBUTE  RANGES 

A  simple  type  of  kr.owkdce  source  for  generating  hy¬ 


potheses  of  object  class  labels  for  particular  region*  las 
been  under  development  in  the  VISIONS  environme.'* 
[WH.81,  WEY83,  RIS84],  They  take  the  form  of  sim¬ 
ple  rules  defined  in  terms  of  ranges  over  a  scalar  feature, 
and  complex  rules  defined  as  combinations  of  the  output 
of  a  set  of  simple  rules.  The  scores  of  these  rules  serve  as 
a  focus  of  attention  mechanism  for  other,  mere  complex 
knowledge- based  processes.  The  rules  can  also  be  viewed 
as  sets  of  partially  redundant  features  each  of  which  de¬ 
fines  an  area  of  feature  space  which  represents  a  “vote” 
for  an  object  on  the  basis  of  this  single  feature  value.  The 
region  attributes  include  color,  texture,  shape,  sise,  im¬ 
age  location,  and  relative  location  to  other  objects.  More 
recently,  the  approach  has  been  extended  to  lines,  with  fea¬ 
tures  including  length,  orientation,  contrast,  width,  etc.  In 
many  cases,  it  is  possible  to  define  rules  which  provide  ev¬ 
idence  for  and  against  the  semantically  relevant  concepts 
representing  the  domain  knowledge.  While  no  single  rule 
is  totally  reliable,  the  combined  evidence  from  many  such 
rules  should  imply  the  correct  interpretation.  One  addi¬ 
tional  rule  type  is  the  intersection  rule  which  allows  line 
information  to  affect  tne  scori  of  a  region  or  vice-versa. 

The  simple  rules  consist  of  a  feature  and  a  piecewise- 
linear  mapping  function  specifying  veto  ranges  and  a  cen¬ 
tral  positive  voting  range,  with  sero  voting  ranges  around 
it  (see  Figure  1).  The  mapping  function  is  defined  by 
six  parameters  specifying  values  on  the  range  of  the  fea¬ 
ture  and  correspond  to  “veto",  sero  score,  maximum  score, 
maximum  score,  sero  score,  and  “veto*  in  that  order.  As 
Figure  1  shows,  this  format  is  meant  to  approximate  a 
Gaussian  shaped  function. 

An  interactive  environment  for  constructing  these  rules 
and  displaying  their  effects  been  created.  A  user  can 
write  a  rule  and  display  the  scores  of  either  regions  or  lines 
in  an  intensity  coded  format.  The  system  then  allows  the 
user  to  edit  the  rule  or  to  incorporate  it  into  a  complex 
rule. 

Complex  rules  are  an  hierarchical  collection  of  other 
rules.  These  rules  could  be  simple  or  complex  rules.  We 
have  typically  structured  the  top-level  complex  rule  for  an 
object  as  a  set  of  five  other  complex  rules  which  represent 
color,  texture,  sise,  shape,  and  location  rules.  The  scores 
of  these  separate  component  rules  are  usually  combined 
with  a  weighted  average  function,  although  any  numeric 
combination  of  the  scores  can  be  used.  A  typical  top-level 
object  hypothesis  rule  is  shown  in  Figure  2.  Complex  rules 
can  be  written  and  edited  in  the  same  way  as  simple  rules 
[BEL85]. 

0.4.  BINARY  RELATIONS  AND  THEIR  USAGE 

The  rules  described  in  the  previous  section  are  unary 
-  they  accept  a  region  as  input  and  return  a  confidence 
for  the  hypothesised  object.  The  highest  ranked  of  these 
hypotheses  form  a  partial  (and  perhaps  errorful)  interpre¬ 
tation  of  the  origin  il  image.  Rules  may  be  defined  over 
pairs  of  regions;  rules  of  this  ty^e  define  a  binary  relation 
between  the  regions  and  the  confidence  values  returned  by 
the  rule  specifies  the  degree  to  which  the  relation  holds. 
Binary  rules  can  begin  to  capture  some  of  the  contextual 
coustraints  which  the  developing  interpretation  is  expected 


230 


to  satisfy. 

In  the  current  implementation  of  the  rule  system,  the 
only  binary  rules  are  similarity  rules  [HAN85],  which  are 
used  to  form  aggregations  of  regions  with  similar  proper¬ 
ties.  In  effect,  the  unary  rule  hypotheses  form  ‘islands 
of  reliability*  which  are  extended  using  binary  similarity 
rules.  Typically,  these  rules  operate  on  primitives  formed 
by  a  single  segmentation  process  (e.g.,  regions  or  lines), 
and  result  in  a  merging  of  the  primitives  into  a  more  com¬ 
plete  description,  depending  on  the  confidence  returned  by 
the  rjles.  Forming  aggregations  of  elements  from  a  single 
segmentation  process  has  advantages  when  dealing  with 
unreliable  segmentation  processes.  Fragmented  elements 
can  be  grouped  to  form  aggregates  which  match  an  object 
model.  The  problem  with  this  approach  is  that  unguided 
grouping  of  elements  leads  to  a  combinatorial  explosion. 
Section  m.5  outlines  a  method  of  forming  aggregates  un¬ 
der  the  guidance  of  ether  segmentation  information,  which 
would  significantly  reduce  the  number  of  possible  aggrega¬ 
tions. 

Plausible  groupings  (perhaps  with  non-empty  intersec¬ 
tions)  of  both  regions  and  lines  can  be  generated  using  bi¬ 
nary  relations.  Regions  can  be  grouped  on  similarity  of 
multiple  attributes,  such  as  coLr,  texture,  and  adjacency 
as  well  as  on  the  degree  to  which  the  grouping  satisfies 
some  unary  constraint,  such  as  shape  (does  the  hypoth¬ 
esized  grouping  form  a  rectangle?).  Relations  that  could 
be  used  to  group  lines  include  similarity  of  orientation  and 
proximity  of  endpoints  [e.g.,  WET85].  Pain  or  sets  of  par¬ 
allel  lines  could  be  grouped.  Pain  or  sets  of  lines  whose 
endpoints  fail  in  the  same  local  neighbc  b » d  could  be 
grouped.  Each  of  these  groups  could  then  be  stored  as  an 
entity  with  its  own  set  of  attributes.  Matching  groups  of 
lines  to  a  model  would  be  much  more  reliable  than  match¬ 
ing  based  in  the  attributes  of  a  single  line.  Objects  with 
a  definite  shape,  arch  as  rectangular,  could  be  extracted 
from  the  groups  of  lines.  Also,  objects  with  characteristic 
lines,  such  as  the  many  parallel  lines  found  in  a  road,  could 
be  extracted  using  Uses  only. 

Binary  relations  also  permit  the  fusion  of  information 
from  multiple  segmentations,  or  more  generally  from  mul¬ 
tiple  sources  of  information.  We  explore  this  issue  in  Sec¬ 
tion  m. 

D  5.  RELATED  RESEARCH 

There  have  been  a  few  efforts  to  integrate  results  from 
multiple  low-levjl  processes  [HAN78b,  KOH84,  NAS8b). 
Usually  these  have  involved  the  integration  of  line  and  re¬ 
gion  data,  which  are  the  two  most  common  types  of  low- 
level  algorithms  employed. 

Kohler  [KOH84]  proposes  solutions  to  the  information 
fusion  problem  at  the  segmentation  level.  The  first  method 
proposed  is  to  allow  each  process  to  segment  the  image  in¬ 
dependently.  Regions  from  each  segmentation  are  then 
combined  in  one  of  two  ways:  1)  project  al)  region  bound¬ 
aries  onto  one  image  and  merge  the  resulting  regions,  or  2) 
project  only  those  boundaries  with  support  from  all  seg¬ 
mentations  and  connect  the  resulting  boundary  segments. 
The  second  method  if  +o  allow  each  segmentation  process 
to  influence  other  segmentation  processes  during  the  initial 


segmentation. 

Naeif  aud  Levine  [NAS83]  also  address  tne  problem  of 
integrating  multiple  soume*  of  information  at  the  segmen¬ 
tation  level.  Their  system  uses  a  rule-based  expert  system 
to  segment  natural  scenes  based  on  a  region  and  line  rep¬ 
resentation.  Production  rules  are  used  to  split  and  merr,e 
regions  and  lines  based  on  spectral  attributes  and  binary 
relations  between  regions  and  lines.  An  additional  set  of 
rules  is  used  to  group  regions  and  lines  into  focus  of  at¬ 
tention  areas  which  are  used  to  guide  the  application  of 
segmentation  processes. 

Reynolds  et  al.  [REY84]  use  region  and  line  informa¬ 
tion  to  guide  processing  in  a  hierarchical  interpretation  of 
very  large  aerial  images.  Lines  which  bound  regions  in  a 
coarse-level  segmentation  are  used  to  locate  regions  with 
rectangular  outlines.  The  parameters  of  long  lines  with 
high  contrast  are  mapped  through  a  Hough  space  to  find  a 
peak  of  common  orientation.  The  lines  which  correspond 
to  this  peak  end  the  regions  with  rectangular  outlines  are 
used  to  select  interesting  areas  ot  the  image,  which  are 
then  subjected  to  finer  levels  of  segmentation. 

Bajcsy  and  TavakoD  [BAJ76|  present  a  system  for  rec¬ 
ognizing  roads  from  aerial  images.  In  their  system  a  world 
rr.ndel  of  roads  is  used  to  extract  road  segments  nr  elon¬ 
gated  regions.  These  segments  are  then  grouped  b;_ed  on 
orientation  and  proximity  to  form  road  hypotheses 

Mckeowa,  Harvey,  and  McDermott  [MCK84]  do  map- 
guided  interpretation  of  aerial  images  by  applying  proto¬ 
type  rules  to  -.'.gions.  These  rules  build  initial  class  or 
sub-class  hypotueses  for  each  region.  Regions  of  a  com¬ 
mon  class  or  sub-class  are  then  grouped  and  the  prototype 
rules  are  reapplied  to  the  groups.  This  process  continues 
until  acceptable  functional  areas  are  defined  for  the  image. 

m.  INTEGRATING  REPRESENTATIONS 
VIA  COMPLEX  AGGREGATES 

The  fundamental  problem  that  is  being  addressed  in 
this  paper  is  the  integration  of  multiple  low-level  repre¬ 
sentations  into  the  interpretation  process.  While  the  ap¬ 
proach  presented  here  offers  only  one  type  of  mechanism 
and  deals  with  only  some  of  the  most  general  levels  of  the 
information  fusion  problem,  there  arc  several  important 
advantages.  First,  it  extends  an  approach  that  has  proven 
to  be  somewhat  effective  on  very  complex  natural  scenes 
[RIS84).  Secondly,  it  offers  an  entirely  modular  and  natu¬ 
ral  approach  to  incorporating  additional  processes  and  rep¬ 
resentations  as  a  vision  system  undergoes  incremental  de¬ 
velopment;  existing  low-level  representations  do  not  have 
to  be  modified  in  any  way.  Each  low-level  representation 
exists  independently  of  any  other  representation;  the  in¬ 
tegration  is  accomplished  through  the  hypotheses  created 
by  ruies  which  relate  entities  in  the  independent  ^presen¬ 
tations.  Finally,  the  integration  takes  place  at  the  inter¬ 
pretation  level,  thereby  bridging  the  gap  between  low  and 
high  level  processes. 

The  key  integrating  mechanism  is  the  definition  of  re- 


281 


lations,  with  associated  computational  measures,  between 
primitive  elements  in  each  type  o{  representation.  Each  '  • 
lation  in  expressed  in  the  form  of  a  rule  cal’ed  a  relational 
~ult.  These  are  defined  between  ‘primitive*  elements  so 
that  sets  of  elements  across  representations  can  be  selected 
and  grouped  on  the  basis  of  relational  scalar  measures. 
The  result,  therefore,  will  be  to  extend  our  representation 
to  include  complex  aggregations  of  elements  which  satisfy 
user-specified  relations  across  the  multiple  representations. 

Let  us  be  somewhat  more  specific.  Consider,  for  ex¬ 
ample,  the  interpretation  of  a  road  scene  on  the  basis  of 
an  intermediate  representation  of  lines,  regions,  and  sur¬ 
faces  (planar  surfaces)  with  attributes  such  as  orientation, 
height  off  the  ground  plane,  and  distance.  The  forma¬ 
tion  of  a  “to id*  object  hypothesis  should  not  be  based 
upon  any  single  element,  bat  rather  upon  an  aggregation 
of  a  set  of  lines,  regions,  and  surfaces  that  have  specific 
relations  to  each  other.  One  might  set  op  a  rule-based 
strategy  for  searching  for  a  region  which  is  approximately 
bounded  by  straight  lines  on  either  side,  with  a  surface 
that  significantly  overlaps  the  region,  whose  orientation  is 
approximately  perpendicular  to  the  gravitational  normal 
or  parallel  to  the  ground  plane.  At  this  point  wc  wish  to 
caution  the  reader  not  to  be  misled  into  an  oversimplified 
view  of  the  problem;  there  are  extremely  difficult  issues  to 
be  dealt  with,  such  as  fragmentation  of  lines,  regions,  and 
surfaces  or  lines  and  surfaces  that  are  only  partially  con¬ 
sistent  with  a  given  region.  However,  these  are  problems 
that  are  implicit  in  the  nature  of  the  problem  of  integrating 
unreliable  »nd  inconsistent  information  and  will  be  true  of 
all  approaches,  not  just  the  approach  presented  hers. 

In  the  sectiovi  that  follow  we  will  develop  this  ap¬ 
proach  as  applied  to  the  output  of  region  and  line  seg¬ 
mentation  processer  that  have  been  developed  over  several 
years  and  which  're  currently  available  in  the  VISIONS 
system.  *Ve  are  working  on  the  extraction  of  surface  el¬ 
ements  from  depth  maps  produced  by  motion  and  stereo 
algorithms,  and  when  these  are  available  we  will  extend 
our  representation  to  allow  surfaces  to  be  aggregated  with 
regions  and  lines. 

m.l.  REPRESENTATION  OF  REGIONS  AND  LINES 

Before  we  discuss  the  development  of  relations  between 
lines  and  regions,  let  ns  briefly  disease  the  representations 
of  these  two  types  of  elements.  Regions  art  by  definition 
a  pixel-based  representation  because  they  are  connected 
sets  of  pixels.  The  line  representation  is  somewhat  unusual 
and  allows  a  very  simple  foundation  lor  the  line-region  re¬ 
lations.  Lines  in  our  algorithm  are  each  extracted  from 
their  fine  support  region,  which  is  the  portion  of  the  in¬ 
tensity  surface  (i.e.,  set  of  pixels)  that  forms  the  intensity 
discontinuity  defining  the  line.  The  important  point  is 
that  lines  are  actually  defined  by  a  region,  and  therefore 
both  segmentation  processes  result  iu  a  pixel-based  repre¬ 
sentation.  The  natural  duality  between  regions  and  their 
boundary  lines  can  be  exploited  in  a  straight  forward  man¬ 
ner,  since  She  line-support  region  associated  with  a  linear 
feature  should  overlap  the  regions  on  either  side  of  the 
region  boundary.  This  means  that  a  set  of  line  support 
regions  can  be  expected  to  be  found  superimposed  on  the 
boundary  of  an  intensity  region.  Conversely,  a  set  of  adja¬ 


cent  intensity  regions  can  be  associat'd  with  a  line  sapport 
region  of  n  linear  feature.  Because  each  line  is  defined  by  » 
line  support  region,  lines  and  regiorc  which  don't  intersect 
but  are  dm  ic  each  other  wiii  still  be  included  in  the  re¬ 
lational  measures.  Since  the  the  width  of  the  line  support 
region  ref  ecu  the  width  of  the  edge,  the  intersection  cf  a 
line  support  set  with  a  region  is  a  fairly  accurate  indica¬ 
tion  that  it  is  related  to  the  region.  Several  examples  are 
shown  in  Figure  3. 

ffl.2.  MEASURES  OF  THE  RELATION  BETWEEN 
LINES  AND  REGIONS 

The  simultaneous  use  of  both  region  and  line  informa¬ 
tion  permits  two  types  of  perceptual  grouping  processes 
to  take  place.  On  the  one  hand  a  region  or  set  of  regions 
can  guide  the  grouping  of  lines,  while  on  the  other  naad 
a  Hue  or  set  of  lines  can  guide  the  grouping  of  regions.  In 
the  following  discussion,  we  use  the  terms  region  and  line 
to  refer  to  either  the  primitive  elements  obtained  from  the 
corresponding  segmentation  process  or  to  sets  of  regions 
and  lines  already  grouped  by  some  other  process,  such  as 
the  similarity  grouping  described  earlier. 

Using  regions  as  the  primary  element  by  which  lines 
are  grouped  leads  to  two  types  of  measures:  texture  and 
shape.  Simple  texture  measures  can  be  derived  easily.  All 
lines  within  a  region  are  filtered,  histogrommed,  or  counted 
in  order  to  produce  a  linear  texture  measure  for  the  region. 
This  method  is  useful  for  regions  with  some  type  of  reg¬ 
ular  texture,  such  as  short  lines  o £  similar  orientation  or 
contrast.  Shape  characteristics  of  a  rrpon  enn  be  derived 
from  the  lines  which  fall  on  or  near  the  boundary  of  the 
region.  Some  primitive  shape  measures,  such  at-  rectangu¬ 
lar!  ty,  can  be  calculated  from  the  aet  of  lines  «hich  bound 
the  region.  Other  mors  complex  measures  of  a ihapu  char¬ 
acteristics  car  also  be  calculated  from  the  bounding  lines. 

With  lines  as  the  primary  element,  regions  ..an  be 
grouped  and  analysed.  One  possibility  is  to  assign  a  bound¬ 
ary  type  to  the  line  based  on  the  attributes  of  the  regions 
which  fall  on  opposite  sides  of  the  line.  Another  possibility 
is  to  group  all  regions  which  fall  on  one  aide  of  the  Ik.,  as 
possibly  belonging  to  the  same  object. 

A  natural  basia  for  the  relations  between  lines  and  re¬ 
gions  is  intersection.  This  is  especially  true  in  our  system 
since  both  lines  and  regions  are  in  a  pixel-baaed  repre¬ 
sentation.  Intersection  does,  however,  eliminate  certain 
relations  from  the  database.  Segmentation  elements  which 
do  not  overlap  will  not  be  considered  in  the  calculation  of 
relations.  This  could  lead  to  problems  when  line  and/or  re¬ 
gion  boundary  placement  is  inaccurate.  An  alternative  re¬ 
lational  method  which  addresses  this  problem  is  discussed 
in  Section  01.3. 

The  intersecting  elements  of  each  segmentation  are 
found  by  computing  the  intersection  of  each  line  support 
set  with  each  intensity  region.  The  lines  resulting  from  this 
process  fall  into  three  categories:  boundary  lines,  interior 
lines,  and  lines  which  are  neither  boundary  nor  interior  and 
as  such  are  not  useful  in  calculating  texture  or  shape  mea¬ 
sures  (these  are  called  ‘superfluous*  lines).  These  three 
line  intersection  types  are  illustrated  in  Figure  3.  Three 
measures  were  derived  as  a  first  pass  at  differentiating  th« 


V 


V* 


232 


* >'j  vwvJ'J  'J  r* ~&~T-rrs r.  i 


three  type*  of  line  intersection.  Theee  measure*  are  called 
•interior-line-percentage*,  ‘region- perimeter-percentage* 

,  and  “line-length-percentage* . 

The  first  measure,  “interior-  line-percer.tage* ,  is  the  ra¬ 
tio  of  line  area  interior  to  the  region  to  total  line  area.  Line 
area  is  calculated  from  the  line  support  region  obtained 
from  the  first  stage  of  the  Burns’  straight  lint  algorithm. 
In  order  to  avoid  confusion,  the  term  “line  support  region* 
(originally  used  by  Burns)  will  be  replaced  by  "line  sup¬ 
port  set*  with  the  understanding  that  the  pixels  compris¬ 
ing  this  set  are  contiguous.  Since  both  the  line  support  set 
and  the  rejpon  are  pixel  baaed  the  interior  line  area  is  rim- 
ply  the  number  oi  pixels  common  to  both  divided  by  the 
total  number  of  pixels  in  the  line  support  set;  an  example 
is  shown  in  Figure  4. 

The  interior- line-percentage  measure  discriminates  in¬ 
terior  from  boundary  lines  but  not  boundary  from  super¬ 
fluous  -xterior  lines.  An  interior  line  will  have  a  value  of 
one  for  this  measure,  indicating  that  the  line-euppcri  set  u 
completely  contained  by  the  region.  Boundary  and  super¬ 
fluous  lines  which  intersect  will  have  some  value  between 
sero  and  one  for  interior-line-percentage.  This  indicate* 
that  the  ins  support  set  crosses  the  region  boundary  and 
has  at  least  one  pixel  in  common  with  of  the  region.  All 
lines  that  do  not  intersect  the  region  will  have  a  value  ot 
sere  for  this  measure. 

The  second  measure,  “region- perimeter- percertege* ,  is 
the  ratio  of  region  boundary  pixels  covered  by  the  line  sup¬ 
port  set  to  the  total  length  of  the  region  perimeter.  Figure 
S  illustrates  how  this  measure  is  calculated.  The  third  mea- 
sure,  ‘line-length-percentage* ,  is  the  ratio  of  the  length  of 
a  region  boundary  (i.e.,  the  set  of  perimeter  pixels)  covered 
by  the  line  support  region  to  the  total  line  length.  That 
is,  the  number  of  perimeter  pixels  of  the  region  which  in¬ 
teract  the  line  support  set,  divided  by  the  length  of  the 
line.  See  Figure  6  fer  a  pictorial  description  of  this  mea¬ 
sure.  This  measure  was  intended  to  indicate  how  much  of 
a  lice  actually  contributes  to  a  region  boundary.  A  line 
which  lies  approximately  on  a  region  Boundary  will  have  a 
high  value  of  line-length-percentage  since  its  l:ne  support 
set  will  cover  a  length  of  region  boundary  approximately 
equal  to  its  length.  This  measure  distinguishes  boundary 
from  superfluous  lines,  but  not  superfluous  from  interior 
lines. 

© 

Region- perimeter-percentage  and  line-length-percent¬ 
age  are  related  in  a  dual  representation.  Region-perimeter- 
percentage  measures  the  fraction  of  a  region  boundary 
made  up  of  one  line,  while  line-length-percentage  measures 
the  fraction  of  a  line  contributing  to  the  region  boundary. 

The  line-length-percentage  measure  may  be  unreliable 
for  several  reasons.  If  the  region  boundary  is  convoluted 
where  the  line  intersects  it  the  region  boundary  length  will 
be  too  high  and  the  measure  will  be  inaccurate.  The  mea¬ 
sure  does  not  distinguish  between  lines  which  are  parallel 
tc  the  region  boundary  and  lines  which  pass  through  the 
boundary  at  an  angle.  Also,  tinea  that  fall  along  the  re¬ 
gion  boundary  but  do  not  intersect  it  are  not  included  in 
the  measures.  The  next  section  presents  some  alternative 
measures  which  address  these  problems. 


ID  J.  ALTERNATIVE  REGION-LINE  RELATIONAL 

MEASURES 

Before  describing  how  ruko  are  defined,  we  wish  to 
briefly  note  a  few  alternative  relational  measures.  These 
alternatives  address  the  problem*  listed  in  the  previous 
sections.  Those  include  inaccurate  placement  of  line  sup¬ 
port  sets  ard  region  boundaries,  the  lack  of  line  to  region 
boundary  angle  measures,  and  relations  which  are  missed 
due  to  the  use  of  intersection  as  the  basis  for  the  relational 
measures. 

Line  support  seta  are  based  solely  on  collections  of  pix¬ 
els  with  common  gradient  direction.  The  algorithm  to  fit 
a  line  to  theee  regions  ignores  small  amounts  of  noise  in 
the  line  support  region  and  usually  computes  an  appropri¬ 
ate  Due.  The  line  to  region  relational  measures  presented 
are  based  solely  on  the  line  support  set,  rather  than  the 
extracted  line,  and  may  produce  inaccurate  results.  Cam 
solution  to  this  problem  is  to  use  something  other  thu  the 
line  support  set  to  compute  the  relational  measures. 

One  approach  is  to  use  only  those  pixels  through  which 
a  line  passes  to  compute  the  same  relational  measures  pre¬ 
sented  above.  This  would  increase  the  accuracy  of  inte¬ 
rior/perimeter  discrimination  but  would  lead  to  problems 
when  a  line  lies  parallel  to  the  region  boundary  but  does 
not  overlap  it.  In  this  case  the  rcgion-perimeter-percentage 
and  line-leagth-pereantage  measures  would  be  uselam. 

A  variation  of  this  approach  b  to  define  an  artificial 
line  seppori  region  which  unif  ormly  surround*  a  line.  The 
width  of  the  artificial  line  support  region  could  be  var¬ 
ied  according  to  the  sharpness  of  the  edge.  This  would 
retain  ths  good  properties  of  line  support  regions  (fussy 
boundary  intersection  and  edge  width)  while  eliminating 
the  problems  of  noisy  line  support  sets. 

A  third  alternative  is  to  use  ths  idem  at  chamfering 
[BARTS]  to  form  distance*  between  region  boundaries  and 
Lines.  A  wave  of  spreading  activation  on  the  array  can 
be  implemented  to  measure  dktance  by  starting  all  cells 
(this  algorithm  b  naturally  thought  of  as  implemented  on 
a  cellular  array  machine)  on  the  line  (or  region  boundary) 
with  a  value  of  0  and  propagating  the  field  to  connected 
neighbors  while  incrementing  their  count  by  t.  In  t  steps 
of  propagation,  celb  that  are  a  distance  t  away  will  be 
reached  with  a  count  r'  L  The  cells  on  the  receiving  re¬ 
gion  boundary  (or  line)  will  have  their  distance  determined 
by  the  earliest  marker  (i.e.,  lowest  value)  or  by  some  func¬ 
tion  of  the  values  resident  in  the  celb  corresponding  to 
the  boundary  of  line.  Thus,  distance  b  obtained  naturally, 
and  various  other  geometric  relations,  such  as  intersection, 
parallels  ess  etc.  can  car  also  be  measured.  'This  approach 
captures  many  of  the  measures  we  have  just  presented  but 
does  not  demand  the  actual  intersection  of  the  two  type* 
of  elements  in  order  to  extract  useful  information. 

m.4.  RELATIONAL  RILES 

Relational  rules  are  the  method  need  to  integrate  in¬ 
formation  from  multiple  ^presentations  to  form  complex 
aggregations.  The  relati>nal  measures  presented  in  the 
preceding  sections  form  one  component  of  the  relational 


2S3 


T ->  W  D.Wk’  -I* ^  i.»  .*:■  J  »  j  t  J»  TT ■  ■  .■■'? 


I 


rule*.  The  other  component  needed  is  a  structured  method 
of  accessing  information  from  each  representation  and  the 
relation*  between  elements  of  the  representations.  In  our 
system  the  relational  rule*  take  the  form  of  ir.'*r*tctio* 
Tvui,  rule*  which  allow  access  to  all  elemente  which  Inter¬ 
sect  a  specific  element. 

Intersection  rake  u<  used  to  combine  region  ana  line 
information  in  a  uniform,  flexible  way.  The  rule  type  is 
designed  to  allow  any  arbitrary  calculation  based  on  the 
lines  intersecting  a  region.  For  this  reason  the  structure 
of  the  rule  L  somewhat  complex.  An  intersection  rule  is 
made  up  of  three  components: 

1)  a  filUrinf  nk  for  selecting  lines  which  intersect  a  re¬ 
gion  based  on  relational  measures; 

2)  a  making  mb  which  ranks  th«  lines  which  intersect  a 
region  baaed  on  line  attributes;  and 

3)  a  combination  function  which  calculates  the  final  score 
of  the  region-line  aggregation  based  on  the  scores  from 

the  filtering  nit  and  the  ruubnf  mb. 

© 

The  jittering  mb  is  a  complex  line  rule  composed  of 
a  simple  rule  for  eacs  relational  measure.  Using  the  ex¬ 
isting  three  relational  measures  a  filtering  rub  would  be 
composed  one  simple  »ob  for  each  of  ths  measures:  in¬ 
terior- lias- percentage,  line- length-percentage  and  region- 
perimeter- percentage.  Any  of  these  simple  rubs  may  b* 
omitted  in  which  cuss  no  filtering  b  done  on  that  rela¬ 
tional  measure.  In  this  sense  the  filter  is  similar  to  the 
simpb  rubs  applied  to  a  region  or  line  attribute,  except 
that  it  b  applied  to  one  or  more  of  ths  relational  measures. 
In  most  cas  is  the  filtering  rub  b  used  to  veto  unwanted 
lines  so  that  they  are  not  considered  by  the  combination 
function. 

The  nntlm;  nit  can  be  a  simpb  or  complex  line  rub. 
Ths  rob  should  rank  each  I'm*  based  on  how  well  it  sup¬ 
ports  the  measure  being  computed  for  the  region.  For  ex 
ampb,  if  a  textr.e  menu*  b  being  computed  those  lines 
which  represent  text  art  (i.e.,  short,  high-contrast)  should 
receive  a  high  score  from  the  rub.  A  rub  to  determine  the 
linearity  of  the  boundary  of  a  region  should  assign  a  high 
score  to  long  lines. 

Although  the  ranking  rub  and  filter  rub  could  easily 
be  combined  into  on*  complex  rob  they  have  been  kept 
separate  for  reasons  of  clarity  and  efficiency.  By  defining  a 
separate  filtering  rub  it  is  dear  which  lines  will  be  isduded 
in  the  complex  aggregation.  The  ranking  rub  represents 
the  attributes  of  the  lines,  not  the  attributes  of  relations 
between  line*  ar.d  regions.  Abo,  since  a  line  may  intersect 
more  than  on*  region  the  filtering  rub  may  have  to  be 
applied  multipb  time*  to  the  taro*  line.  The  ranking  rule 
can  be  applied  once  for  each  line  in  the  database. 

The  final  component  of  intersection  rubs,  the  combi¬ 
nation  function,  can  be  any  arbitrary  function  which  pro¬ 
duces  a  nun-eric  value.  The  combination  function  is  sup¬ 
plied  three  inputs,  tbs  scores  trom  the  filtering  rub,  the 
scores  from  the  ranking  rub,  and  the  intersection  mea¬ 
sures  for  each  line.  For  maximum  flexibility  the  combina¬ 
tion  function  also  ha*  access  to  ths  complete  intermediate 


representation. 

These  intersection  rales  can  be  used  in  some  very  di¬ 
verse  ways.  One  example  is  to  use  a  filtering  rub  on 
in  tenor- lice- percentage  to  select  only  those  lines  which  are 
interior  to  a  ,  egioa.  The  ranking  rub  could  then  be  defined 
to  select  short,  high -contrast  lines.  The  score  of  the  rank¬ 
ing  ruk  could  then  be  avenged  to  form  a  complex  texture 
measure.  Alternatively,  a  density  measure  could  be  calcu¬ 
lated  by  counting  ths  occurrences  of  line*  which  receive  a 
high  score  from  the  ranking  rub  and  then  normalising  by 
t he  sis*  of  the  region. 

As  an  additional  example,  the  measure  fine- length- 
percentage  cool1,  be  used  to  select  liner  which  be  mostly  cm 
the  boundary  of  the  region.  The  ranking  rub  couH  then 
be  defined  to  uvoe  long  lines.  Ths  scores  from  the  rank¬ 
ing  rub  could  then  be  avenged  using  refoa-perimeter- 
percentzge  as  a  weighting  factor  to  form  a  simpb  shape 
measure. 

Intersection  rules  are  also  spplicabb  to  tbs  regions 
which  a  linw  intersects.  In  this  format  a  seer*  b  calculated 
for  the  line  based  on  the  properties  of  ths  regions  which  it 
intersect*.  Having  this  flexibility  introduces  ths  problem 
of  organising  ths  application  and  results  of  ths  intersection 
rales  in  a  meaningful  way.  Ths  next  section  outlines  tbs 
organisation  method  used  in  ths  current  implementation 
along  with  some  alternative  organisation  methods. 

m.5.  STRUCTURE  AND  SCORING  OF  COMPLEX 

AGGREGATIONS 

Thera  are  many  way*  to  apply  iwiUonil  rubs  to  form 
complex  aggregations.  Ths  method  need  in  the  current  im¬ 
plementation  of  the  VISIONS  system  b  to  define  a  primary 
element  type  through  which  othsr  elements  are  grouped 
and  a  score  formed.  Othsr  approaches  include  indepen¬ 
dent  use  of  each  element  type,  a  hybrid  approach  which 
allows  both  of  ths  shove  approach**  to  be  need  romnlta- 
neously,  and  an  approach  which  allows  previously  formed 
aggregations  to  guide  ,‘atur*  element  grouping.  Each  of 
these  approaches  has  advantages  and  disadvantages  which 
are  outlinad  below. 

Tne  approach  used  in  the  current  implementation  b  to 
define  on*  of  the  element  type*,  specifically  regions,  as  the 
primary  element  type.  All  relational  and  object  hypothesis 
scores  are  attached  to  the  primary  elements.  Group*  of 
secondary  elements  era  orJy  considered  as  a  result  of  their 

section  with  tie  primary  elements. 

An  intersection  rob  is  used  to  support  an  initial  object 
hypothesis  by  making  it  on*  component  of  a  complex  rub, 
as  described  in  Section  H4.  If  a  rub  were  written  for 
the  boundary  tines  of  s  region  it  would  b#  added  to  the 
complex  rub  at  the  level  of  the  component  robs.  In  this 
case,  the  top-level  complex  rule  would  consist  of  the  five 
region-based  component  rales  and  as  many  line-based  rales 
as  are  needed  to  define  the  object.  All  of  these  component 
rub*  are  combined  uniformly  by  a  weighted  avenge.  A 
top-level  rale  including  line-based  relational  rules  is  shown 
in  Figure  7. 

The  idea  of  defining  »  primary  element  type  through 
which  ali  grouping  and  scoring  takes  place  can  be  applied 


S' 

S' 

s' 

L  • 

•  ■  ¥ 


f  . 


to  any  element  type  need.  Region*  were  chosen  over  tinea 
in  the  current  implementation  for  two  reason*.  The  first  is 
that  a  tingle  region  is  a  better  descriptor  of  image  events 
than  a  single  line.  The  second  reason  is  that  regions  were 
the  existing  element  type  before  line  information  waa  in* 
tegrated  into  the  system.  By  using  regions  the  high-level 
code  did  not  have  to  be  changed. 

A  disadvantage  o',  the  existing  implementation  is  that 
the  representation  of  the  complex  aggregation  and  score  is 
structured  at  the  top  level  through  elements  in  one  seg¬ 
mentation  representation,  in  this  car;  region  elements.  It 
is  possible  that  the  line  information  could  contain  a  perfect 
representation  of  an  object,  but  that  the  region  informa¬ 
tion  is  fragmented  and  unreliable.  In  this  case  a  high  score 
for  the  object  will  not  be  obtained  since  each  sub-region  of 
the  object  is  intersected  by  only  a  subset  of  the  lines  for 
the  object. 

Figure  8  shows  an  example  of  this.  Ti*  oof  is  frag¬ 
mented  into  many  smaller  reg-oos,  but  the  line  information 
show*  the  outline  of  the  roof  fairly  dearly.  In  this  case  no 
region  will  get  full  benefit  of  the  bounding  lines  since  each 
region  Li  only  bound  by  a  subset  of  the  roof  lines,  Th# 
set  of  bounding  lines  could  be  extracted  first,  then  the  set 
of  regions  bound  by  these  lines  could  be  used  to  form  an 
aggregate  of  regions  on  which  the  region  score  could  be 
based. 

One  solution  to  this  problem  is  to  allow  independent 
use  of  the  information  from  each  segmentation  process. 
This  method  would  apply  rules  to  each  type  of  segmenta¬ 
tion  element.  Elements  from  each  segmentation  type  with 
high  scon,  on  related  rules  could  be  combined  if  they  in¬ 
tersect.  The  score  of  the  aggregate  of  elements  would  then 
receive  a  higher  score  than  each  of  the  individual  elements, 
reflecting  the  additional  support  for  that  object  hypothe¬ 
sis.  Figure  9  shows  the  structure  of  a  complex  rule  which 
uses  this  method. 

Using  each  type  of  segmentation  information  indepen¬ 
dently  has  the  advantage  of  allowing  good  segmentation 
results  to  influence  an  object,  hypothesis  score,  even  if  the 
other  segmentation  proceraus  did  not  extract  the  object. 
One  disadvantage  of  tk>  method  is  that  some  segmenta¬ 
tion  elements,  line*  for  example,  may  not  contain  enough 
informatior  oy  theme*  hr  es  to  .orm  a  reliable  object  hypoth¬ 
esis.  These  types  of  elements  need  to  be  grouped  before 
object  hypothesis  rules  are  applied.  In  the  fi  st  aggrega¬ 
tion  method,  when  a  primary  element  type  is  defined,  sec¬ 
ondary  element  type  rules  are  applied  only  to  ti  v  element* 
which  an  related  to  a  primary  element.  Independent  ap¬ 
plication  of  all  rules  is  inefficient  since  each  role  must  be 
applied  to  every  element  in  the  segmentation. 

A  hybrid  approach  to  the  integration  of  segmentation 
information  could  be  defined  by  allowing  a  combination 
of  the  above  method*  to  be  used.  In  nine  cases,  areas 
cf  constant  texture  for  instance,  it  is  desirable  to  use  re¬ 
gion;  as  the  primary  element  type.  In  other  cases,  partic¬ 
ularly  when  the  object  is  defined  by  a  simple  geometrical 
shape,  lines  should  be  used  as  the  primary  element  type. 


^ach  object  could  be  defined  by  a  separate  model  for  inch 
t/pe  of  segmentation  information.  These  models  could  be 
rank-orderid  based  on  the  reliability  of  each  type  of  seg¬ 
mentation  process  at  extracting  the  primitive  descriptors 
of  the  object  being  modeled.  When  these  models  are  sub 
sequently  used  the  highest  ranked  model  is  considered  the 
primary  model.  The  other  mud  el  types  are  used  only  on 
the  segmentation  elements  which  intersect  the  primary  el¬ 
ements  with  good  matches  to  the  primary  model  Only  if 
the  primary  model  fails  to  locate  good  matches  would  the 
secondary  models  be  applied  to  the  entire  segmentation. 

The  hybrid  approach  has  all  the  advantages  of  the  in¬ 
dependent  approach.  Tie  difference  is  that  only  one  type 
of  object  rule  is  applied  to  the  entire  image  so  the  hybrid 
approach  is  more  efficient.  Also,  the  primary  element  type 
guides  the  grouping  of  secondary  elements  r  filch  avoids  » 
combinatorial  explosion. 

A  final  c  jproach  solves  these  problems  and  also  ad¬ 
dresses  the  problem  of  forming  aggregate*  of  elements  from 
one  segmentation  process.  This  approach  usee  the  hybrid 
method  mentioned  above  and,  in  addition,  allows  existing 
aggregations  to  guide  the  formation  of  additional  aggrega¬ 
tions. 

In  this  approach  existing  aggregations  which  have  miss¬ 
ing  elements,  such  as  one  side  of  a  rectcngle,  an  used  to 
group  additional  elements.  Using  the  rectangle  example, 
the  regions  within  the  existing  sides  of  the  rectangle  an 
grouped  to  form  a  new  aggregation.  This  aggregation  is 
then  used  in  two  ways.  First,  it  is  used  sc  a  single  region 
entity  to  n-compute  the  region  based  mle  scorns.  Second, 
it  is  used  to  group  additional  lines  is  hopes  of  finding  the 
fourth  ride  of  the  rectangle.  In  this  way  existing  aggrega¬ 
tions  an  ussJ  to  form  n~w  aggregations,  both  of  elements 
from  one  segmentation  and  elements  from  more  than  one 
segmentation.  This  approach  reduces  the  number  or  com¬ 
binations  of  element-  considered  for  an  aggregation,  which 
in  turn  reduces  the  combinatorics  of  the  problem  as u  also 
provide*  a  guide  for  grouping  elemtnta  from  on*  segmen¬ 
tation  process. 

IV.  RESULTS 

This  section  presents  tLe  results  of  running  the  inter¬ 
section  rules  on  urban  bouse  scenes  and  on  roed  scenes. 
The  result*  fall  into  two  categories,  those  that  derive 
texture  m*  Mures  and  those  that  derive  shape  measures. 
Where  appropriate  the  results  of  each  component  of  a  rule 
are  presented  to  illustrate  the  advantages  of  integrating 
line  and  region  information. 

IV. I.  RELATIONAL  RULES  USED  FOR  TEXTURE 

MEASUREMENT 

A  simple  texture  measure  can  be  computed  by  count¬ 
ing  the  occurrences  of  short,  higb-cc  c  trust  linee  within  a  re¬ 
gion  and  normalising  by  the  region  tine.  The  filtering  rule 
to  compute  this  measure  would  use  interior-line-percentage 
to  select  only  those  lines  which  are  completely  interior  to 
the  region.  The  ranking  rule  selects  short,  high -contrast 
lines.  The  combination  function  counts  the  number  of  line* 
with  score*  above  some  threshold  and  divides  this  number 


by  the  tut  of  the  region  is  '  ixeto.  Figure  10  thowe  an 
example  of  this  rule  applied  to  a  house  scene. 

Some  objects,  notably  the  roof  of  a  house,  are  char¬ 
acterised  by  short  horuontal  linos  i  Verior  to  the  regiot. 
A  rule  to  produce  high  sccres  for  these  regions  can  be 
obtained  by  first  writing  a  rule  which  selects  short,  hor- 
ixantai  lines.  An  intersection  rule  is  thcu  constructed  from 
this  line  rule,  a  filter  on  interior-line- percentage  which  se¬ 
lects  only  interior  lines,  and  a  combination  function  which 
computes  the  density  oi  lines  whose  contrast  exceeds  some 
threshold.  The  results  of  applying  such  a  rule  to  a  house 
scene  are  shown  in  Figure  11. 

IV. 2.  RELATIONAL  RULES  USED  FOR  SHAPE 

ANALYSIS 

4  simple  shape  measure  can  be  computed  by  deter¬ 
mining  how  mn:h  of  a  region  boundary  is  made  np  of 
long  straight  linen.  The  filtering  rule  for  thia  measure  uses 
interior-  line- percentage  to  select  only  those  lines  which  lie 
on  a  region  boundary.  The  ranking  rule  in  written  to  nak 
long  lines  higher  than  short  linen.  The  combination  func¬ 
tion  then  usee  region-perimeter- percentage  to  compute  the 
percentage  of  a  region  boundary  made  up  of  lines  which 
received  a  high  score  from  tho  ranking  rule.  As  Figure  12 
shows  this  rule  is  useful  for  extracting  roads. 

As  an  extension  to  the  previous  rule,  the  bug  lines 
can  also  be  ranged  on  orientation.  An  intersection  rule  of 
this  type  can  be  used  to  find  regions  with  straight  edges 
oriented  in  a  certain  direction.  To  define  each  a  rule  to 
find  verticnUy  oriented  regions  the  ranking  rule  selects  lines 
which  ant  both  long  and  vertical.  The  combination  func¬ 
tion  then  uses  the  scorn  of  the  ranking  rule  and  the  height 
of  the  region  to  determine  if  a  region’*  vertical  edges  are 
straight.  A  rule  of  this  type  is  applied  to  a  house  scene  is 
Figure  13. 

V.  FUTURE  WORK 

The  first,  and  easiest,  extension  to  the  existing  sys¬ 
tem  is  to  add  additional  segmentation  processes  to  the 
low-level  system.  The  types  of  information  that  could 
be  added  include  motion,  depth,  and  surface  segmenta¬ 
tions.  Each  segmentation  process  would  create  a  set  of  el¬ 
ements  with  associated  attributes  which  would  be  added  to 
the  intermediate-level  representation.  These  new  elements 
could  then  be  used  in  the  same  way  regions  ard  lines  arc 
used  now.  This  would  not  involve  an/  major  modifiertiens 
to  the  system,  bnt  would  greatly  enhance  the  reliability  of 
it*  results. 

Extension*  considered  for  the  relational  measures  in¬ 
clude  additional  relational  measures  based  on  line  support 
arts,  the  forming  of  artificial  line  support  regions,  and  the 
use  of  chamfering  for  relational  measures.  Each  of  these 
edifications  is  aimed  at  increasing  the  accuracy  and  util¬ 
ity  of  relations  between  existing  element*.  It  is  assumed 
that  when  new  types  of  element*  (surfaces)  are  added  to 
the  representation,  only  those  rules  relating  the  new  ele¬ 
ment  type  to  existing  types  will  have  to  be  added  to  the 
rule  set. 

One  extension  being  considered  for  the  formation  of 


complex  aggi  egations  is  to  use  aggregations  of  lines  instead 
of  individual  lines  as  the  line  entities.  The  aggregates  of 
lines  would  be  formed  by  grouping  ali  lines  with  similar  ori¬ 
entation  or  locality  of  endpoints  [e.g.,  WEISS].  Attributes, 
such  as  shape  features,  could  then  be  computed  for  each 
aggregation  of  line*.  These  entities,  along  with  their  at¬ 
tributes,  would  then  be  treated  a*  new  primitive  elements 
by  the  system.  The  existing  relational  measures  would 
still  be  meaningful  for  these  entities  since  they  are  still 
pixel  based.  Additional  measure*  would  probably  be  de¬ 
sired,  however,  since  the  groups  of  lines  would  be  complex 
structures. 

The  existing  method  of  forming  complex  aggregations 
was  implemented  because  of  the  ease  wish  which  it  could 
be  integrated  with  the  existing  system.  As  some  of  the 
problems  of  information  fusion  become  better  understood 
it  is  assumed  that  (his  method  will  be  augmented  with 
some  of  the  ideas  from  Section  HI.S. 

VL  CONCLUSION 

The  method  described  is  a  uniform,  straight-forward 
way  of  combining  evidence  from  multiple  segmentation 
processes.  Some  problems  exist  with  the  region-based  sys¬ 
tem.  bat  the  existing  method  significantly  increases  the 
reliability  of  initial  object  hypothesis  scores  for  some  ob¬ 
jects. 

VIL  BIBLIOGRAPHY 

[BAJ76]  R.  Bajcsy  and  M  Tavskoll,  “Computer  Recogni¬ 
tion  of  Roads  from  Satellite  Picture*,’  IEEE  Trans¬ 
actions  on  Systems,  Mac,  aod  Cybernetics  9MC-t 
(September  1976),  pp.  623-637. 

[BAR78]  H.  G.  Barrow,  J.  M.  Tennenbaum,  R.  C.  Holies, 
and  H.  C.  Wolf  r  Parametric  Correspondence  and  Cham¬ 
fer  Matching:  Two  New  Techniques  for  Image  Match¬ 
ing,”  Pro c.  DARPA  W  Workshop  (May  1976),  pp.  21- 
27. 

[BEL85]  R.  L.  Belknap,  “An  Interactive  Rule  Editing  Sys¬ 
tem  for  the  Definition  of  Initial  Object  Hypothesis 
Rules,”  forthcoming  COINS  Technical  Report,  Univer¬ 
sity  of  Massachusetts  at  Amherst,  1985. 

|3UR84|  J.  B.  Bums,  A.  R.  Hasson,  and  E.  M.  Rise  man, 
“Extracting  Linear  Features,”  Froc.  of  the  Seventh  In¬ 
ternational  Conference  on  Pattern  Recognition  (July 
30  -  August  2,  1964),  Montreal,  Canada  (to  be  pub¬ 
lished  in  PAMI). 

[HAN78a]  A.  R.  Hanson  and  E.  M.  Rise  man,  “VISIONS: 

A  Computer  System  for  Interpreting  Scene*, *  Com¬ 
puter  Vision  System#  (A.  Hanson  and  E.  Risemnn, 
eds.)  (1978),  pp.  303-333,  Academic  Press. 

'HAN78bj  A.  R.  Hanson  and  E.  M.  Rise  man  (E at).  Com¬ 
puter  Vision  Systems,  New  York,  Academic  Press,  1978. 

]HAN8S;  A.  R.  Hanson,  E.  M.  Rise  man,  J.  S.  Griffith  and 
T.  E.  Weymouth,  rA  Methodology  for  the  Develop¬ 
ment  of  General  Knowledge-Based  Vision  Systems,” 
Proc.  IEEE  Workshop  on  Principles  of  Knowledge- 


[REY84]  G.  Reynolds,  N.  Irwin,  A.  R.  Hanson,  E.  M.  Rise- 
man,  ‘Hierarchical  Knowledge-Directed  Object  Extrac¬ 
tion  Using  a  Combined  Region  and  Line  Representa¬ 
tion, ”  IEEE  Proceedings  of  the  Workshop  on  Com¬ 
pute t  . ion :  Representation  and  Control  (1984),  pp. 

238-2i7. 

[RISS4]  E.  M.  Rise  man  and  A.  R.  Hanson,  “A  Methodol¬ 
ogy  for  the  Development  of  General  Knowledge- Based 
Vision  Systems,*  IEEE  Proc.  of  tbs  Workshop  oo  Coat- 
pater  Vision:  Representation  and  Control  (1984),  pp. 
ISfr-170. 


Based  .systems,  Denver,  CO,  December  1984,  pp.  159- 
170. 

[KOH83]  R.  R.  Kohler.  “Integrating  Non-Semantic  Knowl¬ 
edge  into  Image  Segmentation  Processes,”  Ph.D.  The¬ 
sis,  Un'r.  iirity  of  Massachusetts  at  Amheret,  Septem¬ 
ber  19S3. 

[KOH84]  R.  R.  Kohler,  “Integrating  Non-Semantic  Knowl¬ 
edge  into  Image  Segmentation  Processes,*  COINS  Tech¬ 
nical  Report  84-04,  University  cf  Massachusetts  at 
Amherst,  March  1984. 

[LAW85]  D.  T.  Lawton,  Personal  Communication. 

[MCK84]  D.  M.  McKecwn,  W.  A.  Harvey  and  J.  McDer¬ 
mott,  ‘Rule  Based  Interpretation  of  Aerial  Imagery,” 
Dept,  of  Computer  Science,  Carnegie-Mellon  Univer¬ 
sity,  (September  1984). 

(NAG79J  P.  A.  Nagin,  ‘Studies  in  Image  Segmentation  Al¬ 
gorithms  Based  on  Histogram  Clustering  a'.d  Relax¬ 
ation,”  COINS  Technical  Report  79-15,  University  of 
Massachusetts  at  Amherst,  September  1979. 

[NAG821  P.  A.  Nagin,  A.  R.  Hanson,  ahd  E.  M.  Rise  man, 
‘Studies  in  Global  and  Local  Histogram-Guided  Re¬ 
laxation  Algorithms,”  IEEE  Transactions  on  Pattern 
Analysis  and  Machine  Intelligence  S  (May  1982),  pp. 
263-277. 

[NAS83J  A.  M.  Nasif  and  M.  D.  Levine,  *Low  I/rvel  Seg¬ 
mentation:  An  Expert  System,”  Technical  Report  TR- 
83-4  (April  1983),  Electrical  Engineering,  McGill  Urn 
veility. 


[SOU35]  h_  Southwick,  “FEATSYS  -  An  Interr^dla'e- 
Level  Representation  of  Image  Fec.tum  Data.”  forth¬ 
coming  Master's  Thesis  (expected  February  1986),  Com¬ 
puter  and  Infomatio^  Science  Department,  University 
of  Massachusetts  at  Amherst. 


[WEI85]  R.  Weiss,  A.  R.  Hanson,  and  E.  M.  Rise  man,  “Ge¬ 
ometric  Grouping  of  Straight  Lines,*  this  proce.  lings. 

[WEY83]  T.  Weymouth,  J.  Griffith,  A.  R.  Hanson,  and 
E.  M.  Riseman,  “Rule  Based  Strategies  for  Image  In¬ 
terpretation,*  Proc.  AAAI-83  (August  1983). 

[WIL81|  T.  Williams,  “Computer  Interpretation  of  a  Dy¬ 
namic  Image  from  a  Moving  Vehicle,”  Ph.D.  Thesis 
and  COINS  Technical  Report  81-22,  University  of  Mas¬ 
sachusetts  U  Amherst  (May  1981). 


$,  It  I,  I,  lt  I , 

feature  range 


Figure  1.  Structure  of  a  linple  rule  for  mapping 
aa  image  feature  measurement  /;  into  support 
for  a  label  hypothesis  on  the  basis  of  a  prototype 
feature  value.  The  object  specific  mapping  is  ps- 
ramrtrrisrd  by  six  values,  lt. It . 


y.j 


i 


t. 


^*rr 

'*  •» 


237 


FJgvr*  S.  The  Jen  ration  e 4  regior,-p  trimeter-  Figure  0.  The  derirriioo  of  line-la^th-peTcente^e. 

percentage.  The  measure  ii  computed  by  dividing  Ihe  measure  is  computed  by  dividing  the  lenfth 

the  length  of  region  boundary  eoi  tred  by  s  line  of  region  boundary  eorereil  by  a  line  support  set 

rapport  set  ( * )  by  il*i  length  of  tb>  entire  region  (A)  by  the  length  of  the  line  (B). 

boundary  (B). 


. . .  ,lm 


Flgur*  T.  The  structure  of  a  complex  rule  with  relation*!  rule*  added  for  texture  and  shape.  The 
rules  Cl ....  r„  and  /i, are  ruler  oeer  region  and  line  attributes,  rerpcctiTely. 


Figure  t.  The  line*  bounding  a  roof  (left)  form  •  parallelogram,  but  the  vegi 
(right)  to  interecctioa  rule*  will  not .  osuidrr  the  entire  Mt  at  lisee  for  eng  one 


region*  tie  fragmented 


aggregatioe 


color  text  u«  location  iiu  thape 


DM 

/\ 


A  A  A  A  A  A 


^  ’v  * 

w:-  - 


Figure  9.  The  structure  of  a  rde  for  a  complex  o^gregatkm  formed  by  using  line  and  region 

information  independently.  The  rules  rj . rm  and  ft. are  rules  orer  region  and  line 

attributes,  respectirely. 


290 


Figure  10.  A  simple  texture  measure  computed  by  a  relational  rule  which  count*  the  density  of 

line*  within  a  region.  The  line*  which  received  a  high  score  from  the  filtering  rule  (lines  interior  to  ""  ■* 

a  reg.on)  are  shown  on  the  left.  On  the  right,  rule  score*  are  shown  mapped  bach  to  the  region*.  i»  .  t- 

Densely  shadeci  region*  received  high  score*. 


Figure  11.  A  slightly  more  complex  texture  measure  in  which  the  orientation  of  the  lines  is  taken 
into  account.  On  the  left  are  the  line*  which  received  a  high  score  from  the  ranking  rule  and  the 
filtering  rule  (short,  horirontal  lines  which  .ere  interior  to  a  region).  The  region  scores  are  shown 
on  the  right. 


Ffgura  17.  A  relational  role  to  Bind  regia.*  which  arc  bounded  by  long  lee*.  On  the  left  are  the 
linn  which  received  high  fcoree  from  the  filtering  and  racking  mica  (long  linn  which  lie  on  a  region 
boundary).  The  region  Korea  -re  mapped  hack  to  the  image  on  the  right. 


Flgura  IS.  A  simple  shape  measure  computed  by  a  relational  rule  which  measures  the  straightness 
of  vertical  region  boundaries.  The  linn  which  received  a  high  score  from  the  filtering  rule  and  the 
ranking  rule  (long,  vertical  linn  which  lie  on  a  region  boundary)  are  shown  on  the  left.  On  the 
right,  rule  scores  an  shown  mapped  back  to  the  regions. 


PROBABILISTIC  SOLUTION  OF  ILL-POSEO  PROBLEMS  IN  COMPUTATIONAL  VISION 


J.  Marroquin1.  S.  r-’itter-’,  and  T.  Poggio 


The  Artificial  Intelligence  Laboratory,  Massachusetts  Institute  of  Technology 
'Artificial  Intelligence  Laboratory  and  Laboratory  for  Information  and  Decision  Systems 
'•’Laboratory  for  Information  end  Decision  Systems 


' omputationa!  vision  is  a  set  of  inverse  problems. 
Me  review  standard  rep’  ,arization  theory,  discuss  its 
mitations,  and  prese..:  new  stochastic  (in  particular, 
layesian)  methods! Jr  their  solution.  We  derive  eificient 
1 Igorithms  and  describe  parallel  implementations  on 
hgital  parallel  SIMD  architectures,  as  well  rs  a  new 
■lass  of  parallel  hybrid  computes. 


1 .  Introduction 


1.1  Computational  Vision 

Computational  vision  denotes  a  new  field  in  artificial  in- 
elligence  that  has  developed  in  the  last  1 5  years.  Its  two 
main  goals  are  to  devc  op  image  understanding  Syr¬ 
ia  ms  which  automatically  could  provide  scene  descrip- 
lions  from  real  images,  and  to  understand  biological 
vision.  Its  main  focus  is  or,  theoretical  studies  of  vision, 
considered  as  an  information  processing  fasK. 

Since  at  least  the  worn  of  David  Marr  (Marr,  1982;  see 
also  Marr  &  Poggio,  1977),  it  has  been  customary  to 
consider  vision  as  an  information  processing  system 
that  could  be  divided  into  several  modules  at  different 
theoretical  levels,  at  least  as  a  first  aoproximation.  In 
particular,  Marr  suggested  that  the  goal  of  the  first  step 
of  vision  is  to  obtain  descriptions  of  physical  properties 
of  three-dimensional  surfaces  around  the  viewer  such 
as  distance,  orientation,  texture,  and  reflectance  /  his 
first  step  of  vision,  up  to  what  has  been  eJled  2- 
1/2  0  sketch  or  intrinsic  images,  is  mainly  bottom-up 
relying  on  general  knowledge  but  no  special  high-level 
information  about  the  scene  to  be  analyzed. 

The  first  part  of  vision-from  images  to  surfaces-has 
been  termed  early  vision.  Although  this  point-of-view 
has  been  embraced  widely  (see  a  set  of  recent  reviews, 
e.g.,  Brown,  1984;  Brady,  1981:  Barrow  &  Tannenbaum, 
1961;  Poggio.  1984).  it  is  important  to  observe  that 
its  correctness  is  still  to  be  proven.  In  particular,  it 
is  stii!  unclear  what  the  nature  of  the  2-1 /2-D  sketch 
representation  is.  how  different  visual  modules  inte-act, 
how  their  output  is  fused  and  what  is  the  role  of  high- 


level  knowledge  on  early  visual  processes.  The  critical 
oroblem  of  the  organization  of  vision  and  of  the  control 
of  the  flow  of  information  hum  the  different  modules 
and  how  high-level  knowledge  is  used  is  still  very  much 
an  open  problem. 

In  th's  paper,  we  do  not  consider  this  larger  issue.  Our 
point-of-view  is  that  a  rigorous  analysis  of  individual 
i/.cdules  of  lesion  is  bound  to  play  an  important  role  in 
any  full  theory  of  vision. 

1 .2.  Early  Vision 

Early  vision  consists  of  a  set  of  processes  that  recover 
physical  properties  of  visible  three-d'mensional  sur¬ 
faces  from  the  two-dimensional  images.  Computational, 
biological  and  epistemological  arguments  (see  Marr 
and  Poggio,  1977)  suggest  that  early  vis'on  processes 
are  generic  ones  that  correspond  to  conceptually  in¬ 
dependent  modules  that  can  be  studied,  at  least  to  a 
very  first  approximation,  in  isolation.  Table  1  snows  a 
list  of  some  of  the  early  vision  modules. 


Table  1 


The  standard  definition  of  computational  visio.i  is 
that  it  is  inverse  optics.  The  direct  problcm-the  prob¬ 
lem  of  classical  optics-cr  computer  graphics-is  to 
determine  the  images  of  three-dimensional  objects. 
Computational  vision  is  confronted  with  inverse  prob¬ 
lems  of  recovering  surfaces  from  images.  Much  infer- 


mntion  is  lost  during  the  imaging  process  that  projects 
a  three  dimensional  world  into  two-dimensional  arrays 
(images).  As  a  consequence,  vision  must  rely  on  natural 
constraints,  that  is,  general  assumptions  about  the 
physical  world  to  derive  an  unambiguous  output.  This 
is  typical  of  many  inverse  problems  in  mathematics  and 
physics. 

In  fact,  the  common  characteristics  of  most  early 
vision  problems,  in  a  sense  their  deep  structure,  can 
be  formalized:  early  vision  pioblems  are  ill-posed  in 
the  sense  defined  by  Hadamard.  A  problem  is  well- 
posed  when  its  solution  (a)  exists,  (b)  is  unique  and 
(c)  depends  continuously  on  the  initial  data.  Ill-posed 
problems  fail  to  satisfy  one  or  more  of  these  criteria. 

Bertero,  Poggio  and  Torre  (1906)  show  precisely  the 
mathematically  ill-posed  structure  of  several  problems 
listed  in  Table  1  (see  also  Poggio  and  Torre,  1984.) 
The  recognition  that  early  vision  problems  are  ill-posed 
suggests  immediately  the  use  of  regularization  methods 
developed  in  mathematics  and  mathematical  physics 
for  solving  the  ill-posed  problems  of  early  vision  (Poggio 
&  Torre,  1984). 

1.3.  Standard  Regularization  in  Early  Vision 

The  main  idea  for  “solving”  ill-posed  problems  ;s  to 
restrict  the  class  of  admissible  solutions  by  introducing 
suitable  a  prion  knowledge.  In  standard  regularization 
methods,  due  mainly  to  Tikhonov,  the  regularization  of 
the  ill  nosed  problem  of  finding  c  from  the  data  y  : 

Az  =  y 

requires  the  choice  o i  norms  ||-|j  and  of  a  stabilizing 
functional  |jP*||.  in  standard  regularization  theory,  A 
is  a  linear  operator,  the  rorms  are  quadratic  and  P  is 
linear.  A  method  that  can  be  applied  is: 

Find  r  that  minimizes 

||.tz-y||2  +  X||/’Z||2,  (1) 

where  >.  is  a  sc-called  regularization  parameter. 

In  this  method,  X  controls  the  compromise  between  the 
degree  of  reyularization  of  a  soluvcn  and  its  closeness 
to  the  data  (the  first  term  in  equation  1).  r  embeds 
the  physical  constraints  of  the  problem.  It  can  be 
shown  for  quadratic  variational  principles  that  under 
mild  conditions  the  solution  space  is  convex  and  a 
unique  solution  exists. 

Poggio  et  al  (1984. 1985)  show  that  several  problems  in 
early  vision  can  be  “solved”  by  standard  regularization 
techniques.  Surface  reconstruction,  optical  flow  at 
each  point  in  the  image,  optical  flow  along  contours, 
color,  stereo  can  be  computed  by  using  standard 
regularization  techniques.  Variational  principles  that 
are  not  exactly  quadratic  but  have  the  same  (orm  as 


equation  1  can  be  used  for  other  problems  in  early 
vision.  The  main  results  of  Tikhonov  can,  in  fact,  be 
extended  to  some  cases  in  which  the  ooerators  A  and  1‘ 
are  nonlinear,  provided  they  satisfy  certain  conditions. 
(Morozov,  1984.) 

Standard  regularization  methods  can  be  implemented 
very  efficiently  by  para.lel  architectures  of  the  fine  grain 
type,  such  as  the  Connection  Machine  (Hillis,  1965). 
Analog  networks,  either  electrical  or  chemical,  can  also 
be  a  natural  way  of  solving  the  variational  principles 
dictated  by  standard  regularization  theory  (Poggio  & 
Koch.  1984,  1985).  A  list  of  the  problems  that  can  be 
regularized  by  standard  regularization  theory  or  r'ightly 
non-linear  versions  of  it  are  listed  in  Table  2,  together 
with  the  associated  regularization  principle. 

1.4.  Limitations  of  Standard  Regularization  Theory 

This  new  theoretical  iramework  for  early  vision  shews 
clearly  not  only  the  attractions,  but  also  the  limitations 
that  are  intrinsic  to  the  standard  Tikhonov  form  of 
regularization  theory.  Standard  regularization  methoas 
lead  to  satisfactory  solutions  of  early  vision  problems 
but  cannot  deal  effectively  and  directly  with  a  few 
general  problems  such  as  discontinuities  and  fusion  of 
information  from  multiple  modules. 

Standard  regularization  theory  with  linear  A  and  P 
is  equivalent  to  restricting  the  space  of  solution  to 
generalized  splines,  whose  order  depends  on  the  or¬ 
der  of  the  stabilizer  P.  This  means  that  in  some  cases 
the  solution  is  too  smooth,  a,  id  cannot  be  faithful  in 
locations  whore  discontinuities  arc  present.  In  optical 
flow,  surface  reconstruction  and  stereo,  discontinuities 
are  in  fact  not  only  present,  but  also  the  most  critical 
locations  for  subsequent  visual  information  processing. 
Standard  regularization  cannct  deal  weli  with  another . 
critical  problem  of  vision,  the  problem  of  fusing  In¬ 
formation  from  different  early  vision  modules.  Since 
the  regularizing  principles  of  the  standard  theory  are 
quadratic,  they  lead  to  linear  Euler-Lagrange  equa¬ 
tions.  The  output  of  different  modules  can  therefore 
be  combined  only  in  a  linear  way.  Terzopoulous  (1984; 
see  also  Poggio  et  al.,  1985)  nas  she.  n  how  standard 
regularization  techniques  .  >o  oe  used  in  the  presence 
of  discontinuities  in  the  ca-n  of  surface  interpolation. 
After  standard  regularization,  locations  whero  the  solu¬ 
tion  /  originates  a  large  error  in  the  second  term  of 
equation  1,  are  identified  (this  needs  setting  a  threshold 
for  the  error  in  smoothness).  A  second  regularization 
step  is  then  performed  us-ng  the  location  of  discon¬ 
tinuities  as  boundary  conditions. 

A  similar  method  could  be  used  for  fusing  information 
from  multiple  sources:  a  regularizing  step  could  be 
performed  and  locations  where  terms  of  the  type  of 
the  first  term  of  equation  (1)  give  large  errors  would  be 
identified.  A  decision  step  would  then  follow  by  setting 
appropriately  various  controlling  parameters  in  those 


294 


V 


- 


Problem 
Edge  detection 
Optical  Flow  (area  baaed) 

Optical  Flow  (contour  based) 

Surfaca  ttwonairaelian 

Spatiotemporal  approximation 

Color 

Shape  from  Shading 

Stereo 


Table  2 

Regular  station  Principle 
/[(S/-«)a  +  A(/„)5]« 

/  [(«'*«  +  t,tn«)3  +A(u’  +  +  v\  +  »j)]didy 

/[(V.N-F^)a  +  A(^V)J]da 

/  [(5  •  /  -  d)’  +  X(fl  H  2 /?,  +  /,*,),]d*dy 

/[($/-  *)2  +  A(V/  •  V  +  /,)’]<!*<*? 

||/>->U|j%A||P,|!a 

/  |(E  -  «(/,»))*  +  A(/,*  +  /,a  +  gMa  +  p^jjdxdy 
/  |  fv*C  •  (t(*,y)  -  R(i  +  <*(*,*),»))  j  +A(Vd)*|dtdy 


locations,  therefore  weighting  in  an  appropriate  way 
(for  instance,  vetoing  some  of)  the  various  contributing 
processes. 

In  any  case  one  would  like  a  more  comprehensive 
and  coherent  theory  capable  of  dealing  directly  with 
the  probiem  of  discontinuities  and  Lie  problem  of 
fusing  information.  So  the  challenge  for  a  regularization 
theory  of  early  vision  is  to  extend  it  beyond  standard 
regularization  methods  and  their  most  obvious  non¬ 
linear  versions. 

1.5.  Stochastic  Route  to  Regularizing  Early  Vision 

In  this  paper,  we  will  outline  a  rigorous  approach  to 
overcome  part  ol  the  ill  posedness  of  vision  problems, 
based  on  Bayes  estimation  and  Markov  Ran -tom  Feld 
models,  ihat  effectively  deals  with  the  problems  faced 
by  the  standard  regularization  approach.  In  tfsis  ap¬ 
proach,  the  a  priori  knowledge  is  represented  in  terms 
of  an  appropriate  probability  distribution,  whereas  in 
standard  regularization  a  priori  knowledge-  leads  to 
restrictions  on  the  solution  space.  This  distribution, 
together  with  a  probabilistic  description  ol  the  noise 
,nat  corrur  's  the  observations,  allows  one  to  use  Bayes 
theory  to  compote  the  postorior  distribution  which 
represents  the  likelihod  of  a  solution  /  given  the  ob¬ 
servations  g.  In  this  way.  we  can  solve  the  reconstruc¬ 
tion  problem  by  finding  the  estimate  /  which  cither 
maximizes  this  a  posteriori  probability  distribution  (the 


so  called  Maximum  a  Posteriori  or  MAP  estimate),  or 
minimizes  the  expected  value  (with  respect  to  Ps !f)  of 
an  appropriate  error  function.  The  class  of  solutions 
that  can  be  obtained  in  this  way  is  much  larger  than 
in  standard  regularization.  In  particular,  we  will  show 
under  which  conditions  this  new  method  leads  to  solu¬ 
tions  that  are  of  the  standard  regularization  type  (see 
section  3). 

The  price  to  be  paid  for  this  increased  flexibility  is 
computational  complexity.  New  parallel  architectures 
and  possibly  hybrid  computers  of  the  digital-analog 
type  promise  however  to  deal  effectively  with  the 
computa'ionai  tequirements  of  the  methods  proposed 
here.  We  will  discuss  at  trr.  end  of  the  paper  in  some 
detail  these  new  parallel  architectures. 

2.  Probabilistic  Models 

The  key  to  the  success  in  the  use  of  this  approach,  is 
our  ability  to  find  a  class  of  stochastic  models  (th?< 
random  fields)  that  have  the  following  characteristics: 

(i)  The  probabilistic  dependencies  between  the 
elements  of  the  field  should  be  local.  This 
condition  ts  necessary  if  !h  *  field  is  to  be  used 
to  model  surfaces  that  are  only  piecewise 
smooth;  besides,  it  it  is  satisfied,  the  reconstruc¬ 
tion  algorithms  are  likely  to  be  distributed, 
and  thus,  efficiently  implementable  in  parallel 


295 


hardware. 

(ii)  'l'ie  class  should  be  rich  enough,  so  that  a 
wide  variety  of  qualitatively  different  behaviors 
can  be  modeled. 

(;ii)  The  relation  between  the  parameters  of  the 
models  and  the  characteristics  of  the  cor¬ 
responding  sample  fields  should  be  relatively 
transparent,  so  that  the  models  are  easy  to 

specify. 

(iv)  It  should  be  possible  to  represent  the  prior 
probability  distribution  /'/  explicitly,  so  that 
Bayes  theory  can  be  applied. 

(v)  It  should  be  possible  to  specify  efficient  Monte 
Carlo  procedures,  both  for  generating  sampie 
fields  from  the  distribution,  so  that  the  capability 
cl  the  model  to  represent  our  prior  knowledge 
can  be  verified,  and  to  compute  the  optimal 
estimators. 

A  class  of  random  fields  that  satisfies  these  require¬ 
ments  is  the  class  of  Markov  Random  Fields  (MRF’s) 
on  finite  lattices  (see  Wong,  i960  and  Woods.  1972J.  A 
MRF  has  the  property  that  the  probability  distribute  * 
the  configurations  ot  the  field  can  always  be  expressed 
in  the  form  of  a  Gibbs  distribution: 

m- 

where  Z  is  a  normalizing  constant,  To  is  a  parameter 
(k.iown  as  the  “nalural  temperature”  at  the  field)  and 
the  “Energy  function”  U(f)  is  of  the  form: 

t/(/)  =  EW) 

c 

where  C  ranges  over  the  "cliques"  associated  with  »he 
neighborhood  sys'em  of  the  field,  and  the  potentials 
V((f)  are  functions  supported  on  them  (a  clique  Is 
either  a  single  site,  or  a  set  of  sites  such  that  any  two 
sites  belonging  to  it  are  neighbors  of  each  other). 

,*s  an  example,  the  behavior  of  piecewise  constant 
functions  can  be  modeled  using  first  order  MRF  moc  jls 
on  a  finite  lattice  (.  with  generalized  Ising  poteff.als 
(Geman  and  Geman,  1984): 


/’/(/)  =  tv/)] 

W  =  2 >(/../,)  (2) 

defines  a  one  parameter  family  of  models  (indexed  by 
To)  describing  piecewise  constant  patterns  with  varying 
degrees  of  granularity. 

We  will  assume  that  the  available  observations  g  are 
obtained  from  a  typical  realization  /  of  the  field  by  a 
degrading  operation  (such  as  sampling)  followed  by 
corruption  with  i.i.d.  noise  (the  form  of  whose  distribu¬ 
tion  is  known),  so  that  the  conditional  distribution  can 
be  written  as: 

f#|/(9;  /)  =  exp[-a  £  ♦,{/,  j,-)J  (3) 

■es 


where  {♦,}  are  some  known  functions,  and  a  is  a 
parameter. 

The  posterior  distribution  is  obtained  from  Bayes  rule: 

-  9)  =  exp [— f)j  (4) 

with 

Vy{f  ;  9)  -  +  r  Hf.  ft)  (5) 

*0  if  3 

For  example,  in  the  case  of  Dinary  fields  (M  —  2) 
witn  the  observations  taken  as  the  output  of  a  binary 
symmetric  channel  (BSC)  with  error  rate  <  (Gallager, 
1975),  we  hove: 


F(«  I /<)-{£  t]' 


for  a,  /,- 
for  ft  /  fi 


The  posterior  energy  reduces  to: 


tW/i  9)  =  7r  Z  VV»  fi)  +  «  El*  -  <(A  ~  ft))  (6) 


T'o  _ 
where  /,  €  {91.92}; 


1,  if  a  =  0 
otherwise 


and 


«•)-{!: 


(7) 

(8) 


1-1,  if  !«  -  >1  -  1  and  fi  •=  fj 

Vcdi.fi)  =1.  if  !>  -  n  =  1  and  /,  ^  /, 

to,  otherwise 

f,eQ,  =  {91 .... ..9m)  for  all  i €  L 

We  will  use  a  free  boundary  model,  so  that  the 
neighborhood  size  for  a  given  site  will  be:  4,  if  it 
is  in  the  interior  of  the  lattice:  3,  if  it  lies  at  a  boundary, 
but  not  at  a  corner  and  2  for  tho  comers. 

The  Gibbs  distribution; 


3.  Cost  Functionals 


The  Bayesian  approach  to  tho  solution  of  reconstruc¬ 
tion  problems  has  been  adopted  by  several  researchers. 
In  most  cases,  the  criterion  for  selecting  the  optimal 
estimate  has  been  the  maximization  of  the  posterior 
probability  (tho  Maximum  a  Posteriori  01  MAP  estimate). 
It  has  been  used,  lor  example,  by  Geman  and  Geman 
(1984)  for  the  restoration  of  piecewise  constant  images; 
by  Grenandor  (1984)  for  pattern  reconstruction,  and  by 
Elliot  et.  al.  (1983)  and  Hansen  and  Elliot  (1982)  for  the 


segmentation  of  textured  images  (a  similar  criterion  — 
the  maximization  of  a  suitably  defined  likelihood  func¬ 
tion  —  has  been  used  by  Cohen  and  Cooper  (1984) 
for  the  same  purposes). 

In  some  other  cases,  a  performance  criterion,  such  as 
the  minimization  of  the  mean  squared  error  has  been 
implicitly  used  for  the  estimation  of  particular  classes 
of  fielos.  For  example,  for  continuous-valued  fields 
with  exponential  autocorrelation  functions,  corrupted 
by  additive  white  Gaussian  noise,  ‘lani  and  Assefi 
(1972)  and  Habibi  (1972)  have  used  causal  linear 
models  and  optimal  (Kalman)  linear  filters  for  solving 
the  reconstruction  problem. 

The  minimization  of  the  expected  value  of  error  func¬ 
tionals,  however,  has  not  been  used  as  an  explicit 
criterion  for  designing  optimal  estimators  in  the  general 
case.  We  will  show  that  this  design  criterion  is  in  fact 
more  appropriate  in  our  case,  for  the  following  reasons: 

(i)  It  permits  one  to  adapt  the  estimator  to  each 
particular  problem. 

(ii)  It  is  in  closer  agreement  with  one’s  intuitive  as¬ 
sessment  of  the  pertormance  of  an  estimator. 

(iii)  It  leads  to  attractive  computational  schemes. 

As  an  example,  we  will  now  propose  design  criteria 
for  two  particular  problems:  image  segmentation  and 
surface  reconstruction. 

Consider  a  field  /  with  N  elements  each  of  which 
can  belong  to  one  of  a  finite  set  Q,  of  classes.  Let 
/,  denote  the  class  to  which  the  iih  element  belongs. 
The  segmentation  problem  is  to  estimate  /  from  a 
set  of  observations  (jt ,  -  - <7(.}-  Mote  tl.at  /,  does  noi 
necessarily  correspond  to  the  imago  intensity.  It  mav 
represent,  for  example,  the  texture  class  for  a  region 
in  the  image  (as  in  Elliot  et.  al.,  1983),  etc. 

A  reasonable  criterion  for  the  performance  of  an  es¬ 
timate  /  is  the  number  of  elements  that  arc  not  classified 
correctly.  Therefore,  we  define  the  segmentation  error 
e,  as: 

«.(/./)  =  E(1  -  Hf.  -  /.))  . e  Q .  (3) 

«= i 

In  the  case  of  the  reconstruction  problem,  an  estimate 
/  should  be  considered  “good"  if  it  is  dose  to  /  in  the 
ordinary  sense,  so  that  the  total  squared  error: 

M/./)  =  £  )2  (1°) 

1=1 

will  be  a  reasonable  measure  for  its  performance. 

To  derive  the  optimal  estimators  with  respect  to  the 
criteria  stated  above,  we  first  present  the  general  result 
(which  can  be  found,  for  example  in  Abend,  1968) 
v.hich  states  that  if  the  posterior  marginal  distributions 


for  every  elemerr  of  the  field  are  known,  the  optimal 
Bayesian  estimat  s  with,  respect  to  any  additive,  positive 
definite  cost  fu  ict  onal  C  may  be  found  by  indepen¬ 
dently  minimizing  the  marginal  expected  cost  ,or  each 
element. 

In  more  precise  terms,  we  will  consider  cost  functionals 
G’(/,  /)  of  (he  form: 

c(/.  7)  =  M/, ■,/.■) 

•et 

with 


We  will  assume  that  the  value  of  each  element  /,-  of 
the  field  /  is  constrained  to  belong  to  some  finite  set 
Qi  (the  generalization  to  the  case  of  compact  sets  is 
straightforward).  The  Optimal  Bayesian  estimator  f 
with  respect  to  the  cost  functional  C  is  defined  as  the 
global  minimizer  of  the  expected  value  of  C  over  all 
possible  /  and  g.  One  can  prove  that  this  estimate  can 
be  found  bv  minimizing  independently  the  marginal 
expected  cost  for  each  element,  i.e., 

7i  ~  •  Y.  °>{r'  f)/'>(r  I  sr)  <  £  C.(r,$)Pi(r  I  g) 

rCQ.  r€0, 

for  all  *  ^  g,  and  for  all »  €  L 

where  P,(r  |  g)  is  the  posterior  marginal  distribution  of 
the  element  i: 

r.(r|9)=  E  /’/!,(/;  j) 

The  optimal  estimators  for  tne  error  criteria  defined 
above,  can  be  easily  derived  from  this  result: 

In  the  case  of  the  segmentation  problem,  we  get  that 

7*  =  <7  €  Qi  :  Pi{g  |  g)  >  #*(•  I  j) 

for  all  t  q  (1 1) 

We  will  call  this  estimate  the  "Maximizer  of  the  Posterior 
Marginals”  (7Mrw)- 

For  tho  reconstruction  problem,  the  optimal  estimate 
is: 

/*=<?€  Q,  =  (A- 9)J  <(/,-*)’ 

forall*^?  (12) 

We  will  call  this  estimate  the  ‘‘Thresholded  Posterior 
Mean"  CfrrM)- 


97 


t 


V 


The  main  obstacle  for  the  practical  application  of 
these  results,  lies  in  the  formidable  computational  cost 
associated  with  the  exact  computation  ot  the  marginals 
and  the  mean  of  the  posterior  distribution  given  by  (5), 
even  for  lattices  of  moderate  size.  In  the  next  section 
we  will  present  a  general  distributed  procedure  that  wii) 
permit  us  to  approximate  these  quantities  as  precisely 
as  we  may  want 

4.  Algorithms. 

The  algorithms  that  we  will  propose  are  based  on 
the  use  of  the  Metropolis  (Metropolis  e.  al.,  1956)  or 
Gibbs  Sampler  (Geman  and  Geman,  1984)  schemes,  to 
simulate  the  equilibrium  behavior  of  the  coupled  MRF 
described  by  equation  (5).  We  recall  that  the  Markov 
chain  generated  by  these  algorithms  is  regular,  and 
their  invariant  measure  is  the  posterior  distribution  Pj |,. 
The  lav/  of  large  numbers  for  regular  chains  (see,  for 
example.  Kemeny  and  Snell,  i960)  establishes  that  the 
fraction  of  time  that  the  chain  will  spend  on  a  given 
state  I  will  tend  to  /’/,„(/;  j)  as  the  number  of  steps  gets 
large,  independently  of  the  initial  state.  This  means  that 
we  can  approximate  the  posterior  marginals  by: 

/to  I  »)«:--  E  <(/!"-*)  <>3) 

*  -  n  !=* 

and  /  by: 

?.  -  --l  t  f{:]  m 

n~*t.=k 

where  is  theconfigurati  in  generated  by  the  Metropolis 
algorithm  at  *ime  t,  and  k  is  the  time  required  for  the 
system  to  be  in  thermal  equilibrium.  From  inese  values, 
‘/mi'm  and  fT,-M  can  be  easily  computed  using  (11) 
and  (12). 

This  procedure  is  related  to  the  use  of  simulated 
annealing  for  finding  the  global  minimum  of  U/>  (i.e., 
the  MAP  estimate:  see  Geman  and  Geman,  1984). 

In  our  case,  however,  we  are  interested  in  gathering 
statistics  about  the  equilibrium  behavior  of  the  coupled 
field  at  a  fixed  temperature  T  —  °l,  rather  than  in 
finding  the  ground  state  of  the  system.  This  fact  gives 
our  procedure  some  distinct  advantages: 

1.  It  is  difficult  to  determine  in  general  the  descent 
rate  of  the  temperature  (annealing  scnedule)  that 
will  guarantee  the  convergence  or  the  annealing 
process  in  a  reasonable  time  (it  usually  involves  a 


trial  and  error  procedure).  Since  we  are  running 
the  Metropolis  algorithm  at  a  fixed  temperature, 
this  issue  becomes  irrelevant. 

2.  Since  in  our  case  we  are  using  a  Monte  Carlo 
procedure  to  approximate  the  values  of  some 
integrals,  we  should  expect  a  nice  convergence 
behavior,  in  the  sense  that  coarse  approximating 
can  be  computed  very  rapidly,  and  then  refined 
to  an  arbitrary  precision  (in  fact,  it  can  be  proved 
(see  Feller.  1950)  that  the  expected  value  of  the 
squared  error  of  the  estimates  (13)  arid  (H)  is 
inversely  proportional  tc  n). 

The  main  disadvantage  of  this  procedure  is  that  in  the 
case  of  the  segmentation  problem,  a  iaige  amount  of 
memory  might  be  required  if  the  number  of  classes 
per  element  m  is  large  (we  need  to  store  the  N(m~  l) 
numbers  that  define  the  posterior  marginals). 

With  respect  to  the  relative  performance,  we  point 
out  that  in  many  cases,  particularly  for  high  signal  to 
noise  ratios,  the  MAP  estimate  is  usua'ly  close  to  the 
optimal  one.  If  the  noise  level  r  high  however,  the 
difference  in  the  performances  of  the  two  estimators 
may  be  dramatic.  This  is  illustrated  in  the  example 
ooitrayed  in  figure  1:  Panel  (a)  represents  a  tvpical 
realization  of  a  04  x  04  binary  Ising  net  with  free 
boundaries,  using  a  value  of  To  =  1.74  (0.75  times 
the  critical  temperature  of  the  lattice);  panel  (b),  the 
output  cf  a  binary  symmetric  channel  with  error  rate 
t  =  0.4;  panel  (c)  the  MAP  estimate,  and  panel  (d)  an 
approximation  to  the  MPM  estimate  (which  we  will  label 
"MPM  (M.C.)”)  obtained  using  the  Metropolis  algorithm 
and  equation  (10)  to  estimate  the  posterior  density. 
The  corresponding  values  of  the  posterior  energy  U,< 
(equation  (13))  and  the  relative  segmentation  error 
(e,/642)  are  shown  on  Table  3. 

It  is  clear  that  the  approximation  to  the  MPM  estimates 
shown  in  panel  (d)  is  better  than  the  MAP  from  almost 
any  viewpoint. 

An  intuit've  explanation  for  this  behavior  comes  from 
the  fact  that  the  MAP  estimator  is  implicitly  minimizing 
the  expected  value  of  a  cost  functional  cMAr(f,j) 
which  is  equal  to  zero  only  if  /,  =  /,  tor  all  i,  and 
is  equal  to,  say,  M  otherwise.  If  the  signal  to  noise 
ratio  is  sufficiently  high,  the  expected  value  of  the 
optimal  segmentation  error  will  be  very  close  to  zero, 
so  that  Smvm  and  Jmai-  will  coincide.  In  a  high  noise 
situation,  however,  the  MAP  estimator  will  tend  to  be 
too  conservative,  since  from  i*p  viewpoint  it  is  equally 


Table  3 

/  9  /map  /MpM(Dct-) 


Energy  -5594.8  -226.0  -6660.9  -6460.C  -6427.0 

Seg.  Error  -  0.4  0.33  0.128  o  124 


v’* 

c. 

(. 

y\ 


I- 

77 


293 


ggpm 


Figure  1.  (a)  Sample  function  of  a  binary  MRF.  (b)  Output  of  a 
binary  symmetric  channel  (error  rate:  0.4)  (c)  MAP  estimate,  (d) 
Monte  Carlo  approximation  to  the  MPM  estimate. 


cosf  ly  to  make  one  or  one  thousand  mistakes.  The  MPM 
estimator,  in  contrast,  can  make  a  better  (almough 
more  risky)  guess,  since  making  a  few  mistakes  has 
only  a  marginal  effect  on  the  expected  cost 


values  of  r  are  much  larger  in  this  case  (see  Table  3). 


5.  Examples  of  Applications  in  Vision 


A  quantitative  comparison  of  the  performances  of 
the  MAP  and  MPM  estimators,  with  respect  to  the 
segmentation  error,  can  be  obtained  using  the  ratio: 


_  £/.,  «p ?))*.(/■  fu/u-is)) 

£/,»  "P [-£//’(/;  g)j’e.(/,/Tl.w(g)) 

In  figure  2  we  show  a  plot  of  the  ratio  r  for  a  2  x  2 
lattice,  for  different  values  of  the  ert  or  rate  t  and  the 
natuial  temperature  T0.  A3  expected,  r  is  never  less 
than  1.  In  the  worst  case  (for  c  =  o.i  and  T0  =  0.2)  the 
error  of  the  MAP  estimate  is  1.17  times  that  of  the  MPM 
estimate:  if  T„  is  not  too  small  and  e  is  not  too  large, 
both  estimates  coincide,  and  as  <  approaches  0.5  (low 
signal  to  noise  ratio),  the  MPM  estimate  is  consistently 
better  than  the  MAP.  An  experimental  analysis  of  larger 
lattices  reveate  «:  similar  qualitative  behavior,  but  the 


5. 1 .  Recostruction  of  Piecewise  Constant  Functions 

The  efficient  solution  of  this  problem  is  relevant  for 
several  reasons:  binary  images  (or  images  consisting 
of  only  a  few  grey  levels)  are  directly  useful  in 
many  interesting  applications  (for  example,  object 
recognition  and  monipulalion  in  restricts  (industrial) 
environments;,  besides,  several  perceptual  problems, 
such  as  the  segmentation  of  textured  'mages  (Elliot, 
at.  al.  (1983);  Hansen  and  Elliot  (1932);  Cohen  and 
Cooper  (1984)),  or  the  formation  of  perceptual  clusters 
(Marroquin,  1985)  can  be  reduced  to  the  problem  of 
reconstructing  a  piecewise  constant  surface. 

The  prior  model  for  this  kind  of  functions  <s  given  by 
equations  (1)  and  (2),  and  the  posterior  distribution, 
by  equation  (4).  If  the  parameters  that  characterize 
the  system  (namely,  the  "natural  temperature"  Tn  and 
the  noise  parameter  »)  are  known,  the  MPM  estimator 


Figure  2.  Ratio  of  th»  average  errors  of  the  MAP  and  MPM 
estimators  (or  a  2  x  'i  (sing  net. 


produces  excellent  results,  such  as  the  one  illustrated 
in  figure  1. 

In  most  practical  cases,  however,  we  are  only  given  the 
noisy  observations  g  and  general  qualitative  information 
about  the  structure  of  the  field  and  the  noise,  so  that 
/,  „  (which  stands,  for  example,  for  the  error  rate  « 
when  the  noise  corruption  corresponds  to  a  BSC,  or 
for  the  variance,  a1,  in  the  case  of  additive  Gaussian 
noise)  and  Tu  have  to  be  simultaneously  estimated. 

In  principle,  one  could  use  again  a  Bayesian  approach, 
and  assuming  prior  independent  uniform  distributions 
for  a  and  Ta  (in  the  ranges  {a0,  a1]  ar*d  lT*0ir0|, 
respectively),  find  those  a,  To  and  7  which  jointly 
maximize  the  posterior  distribution: 

ixp[— t/V  (a,  7q, /)] 

ru,a,Ta  I  ?)  =  {-^nrao)(ri  -  ri)z(Tc)P,(g) 

The  main  difficulty  here  is  the  extraordinary  computa¬ 
tional  complexity  of  die  partition  function: 

Z(7o)  =  I>p[-~t/o(/)! 

which  n  ikes  this  .approach  impractical,  except  for  very 
small  lattices. 

Another  approach,  with  which  we  have  obtained  very 
good  results,  consists  on  defining  a  merit  function  for 
an  estimate  (obtained  using  a  particular  value  for  the 
parameters),  which  is  related  to  the  degree  of  un.for- 
mity  in  the  spatial  distribution  of  the  corresponding 
residuals.  We  have  used,  for  example,  a  likelihood 
function  L,  which  we  obtain  by  covering  the  lattice 
with  a  set  of  m  non-overlapping  squares  (say.  0  pixels 
wide):  compuling  the  relative  variance  of  the  noise 
parameter,  estimated  over  each  square,  and  adding  all 
these  terms  together. 


where  a  and  <V,  denote  the  condilional  (on  f)  maximum 


likelihood  estimates  of  the  noise  parameter,  obtained 
using  the  residuals  over  the  whole  lattice,  and  over  the 
square,  respectively.  The  optimal  estimate  for  /  is 
then  obtained  as  the  global  maximizer  of  L  over  the 
appropriate  region  of  the  parameter  space.  An  example 
of  the  performs  ice  of  this  scheme  is  presented  in  f  gure 
3,  which  shows  the  restoration  of  a  ternary  pattern 
corrupted  by  additive,  white  Gaussian  noise. 

Note  that  this  estimation  algorithm  allows  us  to  re¬ 
construct  a  pattern  /  from  the  noisy  observations  g 
without  having  to  adjust  any  free  parameters.  The 
only  prior  assumptions  correspond  to  the  qualitative 
structure  of  the  field  /  (first  order,  isotropic  MflF)  and 
to  the  nature  of  the  noise  process.  In  practice,  this 
means  that  we  can  aoply  it  to  restore  any  piecewise 
•uniform  image  with  uniform  granularity,  even  if  it  has 
not  been  generated  ty  a  Markov  random  process.  In 
♦be  particular  case  of  a  binary  field  sent  through  a 
BSC  we  have  developed  a  very  efficient  procedure  for 
approximating  the  MPM  estimator,  which  also,  permits 
us  to  find  the  optimal  (Maximum  Likelihood)  estimate 
using  only  a  one  dimensional  search  (see  Marroquh., 
1985  for  details).  We  have  used  this  algorithm  to 
reconstruct  a  variety  of  binary  images  with  excellent 
results.  In  figure  *  we  show  such  a  restoration.  The 
observations  (bi  were  generated  from  the  synthetic 
image  (a)  with  an  actual  error  rate  of  .35  (asrumed 
unknown).  The  MLE  for  /  Is  shown  in  (C). 

5.2.  Reconstruction  of  Piecewise  Continuous 
Functions. 

In  this  section  we  will  illustrate  the  application  of  the 
local  spatial  interaction  models  and  estimation  tech¬ 
niques  that  we  have  described  to  the  reconstruction 
of  piecewise  continuous  functions  from  noisy  observa¬ 
tions  taken  at  sparse  locations. 

In  i.iis  reconstruction,  it  will  be  important  not  only 
to  interpolate  smooth  patches  over  uniform  regions, 
but  to  locate  and  preserve  the  discontinuities  that 
bound  l.ic-'O  regions,  since  very  oiten  they  are  the 
moot  imperial  if  parts  of  the  function.  They  may  repre¬ 
sent  object  boundaries  in  vision  problems  (such  as 
image  segmentation:  depth  from  stereo:  shape  from 


Figure  3.  (a)  Original  ternary  MRF.  (H  Noisy  observations 
(additive  Gaussian  noise)  (c)  Optimal  (Maximum  livelihood) 
estimate. 


(a)  (b)  (''I 


300 


Figure  4.  (a)  Synthetic  image,  (b)  Noisy  observations,  (c) 
Maximum  Likelihood  Estimate. 


shading;  structure  from  motion,  etc.);  geological  faults 
in  geophysical  information  processing,  etc. 

s  wo  mentioned  in  section  1.4,  an  approach  to 
this  problem  (see  Terzopoulos  (1984))  consist  of, 
first,  interpolating  an  everywhere  smooth  function 
over  the  whole  domain;  then,  applying  some  kir.d 
of  discontinuity  detector  (followed  by  a  thresholding 
operation)  to  try  to  find  the  significant  boundaries, 
and  finally,  to  re-interpolate  smooth  patches  over  the 
continuous  subregions. 

The  results  that  have  been  obtained  with  this  technique, 
however,  are  not  completely  satisfactory.  The  main 
problem  is  that  the  task  of  the  discontinuity  detector  is 
hindered  by  the  previous  smooth  interpolation  opera¬ 
tion.  This  b_  comes  critical  when  the  observations  are 
spa-sely  located,  since  .n  this  case,  the  discontinuities 
may  be  smeared  in  the  interpolation  phase  to  such  a 
degree  that  it  may  become  impossible  to  recover  them 
in  the  detection  phase. 

In  contrast,  in  the  Dayesian  approach,  the  boundaiy 
detection  and  interpolation  tasks  are  performed  af 
the  same  time.  To  apply  the  general  recontructlon 
algorithms  developed  above  to  this  problem,  the  main 
issue  is  the  representation  of  the  concept  of  “piecewise 
continuity"  in  the  form  of  a  prior  Gibbs  distribution  in 
a  meaningful  way. 

A  flexible  construction  involves  the  use  of  two  coupled 
MRF  models;  one  to  represent  the  function  (the 
“surface")  itself,  and  another  to  model  the  curves 
where  the  field  is  discontinuous.  A  coupled  model  of 
this  kind  was  first  used  by  Geman  and  Geman  (1984) 
in  the  context  of  the  restoration  of  piecewise  constant 
images.  Terzopoulos  (1985)  has  recently  attempted  to 


translate  this  idea  in  the  continous  and  deterministic 
framev'ork  of  standard  regularization. 

This  model  can  be  adapted  to  our  problem  by  modify¬ 
ing  the  choice  of  die  potentials  and  the  neighborhood 
struclure  of  the  counted  MRFs.  Specifically,  the  fol¬ 
lowing  modifications  are  needed; 

1.  Since  in  oui  case  the  observations  are  sparse, 
it  becomes  ne<  e  >sary  »o  expand  the  size  of  t? te 
neighborhoods  o  the  lino  fie'd,  to  prevent  the  formation 
of  “thick"  boundaiies  between  the  smooth  patches 
(i.e.,  adjacent,  parallel  segments  of  active  lines  in 
these  regions).  In  particular,  we  propose  that  the  dual 
lattice  be  8-corinected,  with  non-zero  potentials  tor 
tha  cliques  of  the  form  illustrated  in  figura  5  fa)  and 
(b;.  The  inclusion  of  the  cliques  of.  figure  5-fc  has 
the  additional  advantage  of  penalizing  the  occurance 
of  sharp  turns,  permitting  us  to  model  the  formation 
of  piecewise  smooth  boundaries  using  a  binary  line 
process  instead  of  the  4-valued  process  proposed  by 
Geman  arto  Geman.  The  potentials  for  these  cliques 
are  computed  in  the  following  way: 

Let  K..V*  denote  the  potentials  associated  with  the 
cliques  C,,C»  of  figure  5  (a)  and  (b),  respectively,  and 
let  Sk  ( k  €  {a  f>})  denote  the  number  of  line  elements 
belonging  to  Ck  that  are  “on"  at  a  given  limo,  i.e., 

5*  =  £  /,  ,  k  =  a,  b 

<€C, 

The  potentials  Vk  are  given  by. 

Vi  =  ?<bk{Sk)  ,  *  =  a,6 


Figure  5.  Cliques  lor  the  lino  process 


OXO 
x  x 
0x0 


x  O  x 


x 

O 

x 


(a) 


<b; 


v.%v 


kn 

>v-* 


W  *,  ‘ 

•>' 


k  *  ,  « 

04  * 


u 


301 


♦ 


wm&smtmam 


wtiere  0  is  a  constant,  and  the  functions  d>k  are  defined 
by  the  following  tables: 


S„  0  i  2  3-t 

0  O.-t  0.25  1.2  2.0 


St  0  I  2 

Pi  0  0  10 


•t  is  not  difficult  to  see  that  this  choice  of  potentials 
will  effectively  discourage  bo»h  the  formation  of  thick 
boundaries  (S4  =  2)  and  the  presence  of  sharp  turns 
(6,  =  3  and/or  S4  =  2). 

2.  The  potentials  o*  the  depth  process,  which  is  now 
continuous- valued,  have  to  be  modified  to  express  the 
more  relaxed  condition  of  piecewise  continuity  (instead 
of  piecewise  constancy).  Specifically,  we  propose: 


V, 


for  |t  -  ;|  =  1 
otheiwise 


(note  that  lt]  €  {0,  l}) 

3.  Unlike  the  case  of  piecewise  coristant  surfaces, 
we  now  have  to  worry  about  the  maximum  absolute 
difference  in  the  values  of  two  adjacent  depth  sites  that 
wo  are  willing  to  consider  as  a  "smooth"  gradient  (and 
not  a  discontinuity).  This  value,  which  in  general  is 
problem-dependent,  determines  the  magnitude  cl  the 
constant  /?  in  equation  (2),  which  can  be  interpreted 
as  the  coupling  strength  between  the  two  processes. 

Assuming  that  the  observations  are  corrupted  by  l.l.b. 
Gaussian  noise,  wi  get  the  following  expression  for 
the  posterior  energy: 


£//■(/./;?)=  ^£l/i -/,)*(!  + 


+ A  £(/.  -  *)J + £  m  +  £  m 

it  S  c.  c* 

where  5  is  the  set  of  sites  where  an  observation 
present.  As  a  performance  criterion  we  will  use  a  mixed 
cost  functional  of  the  form: 

*m(/.  './.<)=  £(/.■-/,)*+  £(1 

•C 1/  jt-U 

where  /./,  /./  denote  the  depth  ar.d  line  lattices,  respec¬ 
tively.  This  error  o  iterion  means  that  the  reconstructed 
surface  should  be  .  close  as  possible  to  the  true 
(unknown)  surface,  and  that  we  should  commit  as  few 
errors  as  possible  in  the  assertions  about  the  presence 
or  absence  of  discontinuities. 


Applying  the  results  of  section  3,  we  find  that  the 
optimal  estimators  will  be  the  posterior  mean  for  /  and 
the  maximizer  ot  the  posterior  marginals  for  l. 

There  is  one  serious  difficulty  that  prevents  us  from 
applying  directly  the  general  Monte  Cano  procedure 
that  was  derived  above  to  the  computation  of  these 
optimal  estimates:  since  the  depth  variables  are 
continuous-valued,  if  we  discretize  u , cm  finely  enougn 
to  guarantee  sufficient  precision  of  the  results,  the 
computational  complexity  of  either  the  Ms.ropoiis  ar 
Gibbs  Sampler  algorithms  wiil  be  very  large.  One  way 
around  this  difficulty'  is  to  note  that  for  any  fixed 
configuration  of  the  line  field,  the  posterior  anergy 
becomes  a  non-negative  definite  quadratic  form: 

V[f  K»)=  £  (/.  -  lif  +  «  £  (fj  ~  <h?  +  K  (15) 

•  u;Cj*=o  ifcS 

where  o  and  K  are  constants  (note  that  the  first  sum  is 
taken  only  over  those  pairs  of  sites  whose  connecting 
line  clement  is  "off",  and  the  second  one  ever  the  set 
5).  This  means  that  the  poftoior  distribution  of  the 
depth  field  is  conditionally  Gaussian,  so  that,  for  any 
fixed  /.  wa  can  f-i’d  the  optimal  conditional  estimator 
/,’  ac  the  minimizer  of  (15?. 

Let  us  define  the  set  F*  as: 

0  :  -  /<*> 

It  is  clear  that,  if  f,  l  are  the  optimal  esti-  ates  for  our 
problem,  wc  have  tt,_ : 


(/.*)€  F* 

which  suggests  that  we  can  constrain  the  search  for 
the  optimal  estimators  to  this  set.  This  can  be  done, 
in  principle,  by  replacing  the  posterior  energy  with  the 
function: 

(which  depends  only  on  /),  and  use  the  standard  Monte 
Carlo  procedures  to  find  the  optimal  estimator  /.  To 
illustrate  this  idea,  let  us  consider  a  physical  model  in 
the  next  section. 

5.2.1.  Hybrid  parallel  computers 

It  is  well  known  that  the  steady  state  of  an  electrical 
network  that  contains  only  (current  or  voltage)  sources 
and  linear  resistors  will  be  the  global  minimizer  of  a 
qi  -adrattc  functional  that  corresponds  to  the  total  power 
dissi  paled  as  heat  (Osier  et  el,  1971).  It  is  therefore 
possible  lo  contruct  an  analog  network  that  will  find 
the  equilibrium  stale  of  the  depth  field  for  a  given,  fixed 
conliguration  of  the  line  process,  i.e.,  that  will  minimize 
the  conditional  energy  (8)  (see  Poggio  a.id  Koch, 


/  J 


202 


1984;  also  Poggio  et  al.,  1985).  This  suggests  a  hybrid 
computational  scheme  in  which  the  line  field  (whose 
state  is  updated  digitally,  using,  say,  the  Metropolis  or 
Gibbs  Sampler  algorithms)  acts  as  a  set  of  switches 
on  the  connections  between  the  nodes  of  the  analog 
network  whose  voltages  represent  the  depth  process. 
In  particular,  if  /,  represents  the  voltage  at  node  i.  the 
hybrid  network  can  be  represented  as  a  4-connected 
lattice  o!  nodes  (see  figure  6)  in  'which: 

(i)  A  resistance  (of  unit  magnitude)  and  a  switch 
(controlled  by  the  line  element  ltJ)  is  present 
in  every  link  between  pairs  t ,j  of  adjacent 
nodes. 

(ii)  If  an  observation  g,  is  present  at  sito  i,  a 
current  of  magnitude  equal  to  ag,  is  injected 
to  the  corresponding  node,  which  must  also 
be  connected  to  a  common  ground  v>a  a 
resistance  ct  magnitude  i/q  (see  equation  8). 

A  diiect  application  of  Kirchoff  current  law  snows  that 
at  each  node  «  of  this  network  we  w  ll  have: 

E  (/<  -  /j)(1  “  l‘j1  +  ng,/,  =  a<l .-ft 

j(N. 

which  corresponds  to  the  condition 
grad  "(/  1 1)  =  0 

so  that  the  equilibrium  configuration  coincides  with  f\ 

This  scheme  can  be  used,  in  principle,  to  construct  a 
special  purpose  hybrid  compute'"  for  the  fast  solution 
of  problems  of  this  type.  In  a  digital  machine,  the 
exact  implementation  of  this  strategy  will  in  general 


be  computationally  very  expensive,  since  }]  must 
be  computed  every  time  a  line  site  is  updated.  It  is 
possible,  however,  to  develop  approx ;mations  which 
have  an  excellent  experimental  pertorrr  ance,  and  lead 
to  efficient  implementations  (Marroquin,  1985).  The 
performance  of  this  method  is  illustrated  in  figure  7,  in 
which  we  show:  (with  height  coded  by  grey  level)  tho 
observations  (a):  the  initial  state  ot  the  network  (with 
all  the  lines  turned  "off")  (b);  the  final  reconstructed 
surface  (c),  and  the  boundaries  found  by  the  algorithm 
(d),  for  a  square  at  height  2.0  over  a  background  at 
constant  height  =  1.0. 

6.  Signal  Matching 

In  all  the  estimation  problems  we  have  studied  so  far, 
the  posterior  energy  function  had  the  form: 

% 

where  U„(/)  corresponded  to  the  MRF  model  for  the 
field  /.  Tho  functions  <t>„  whose  precise  form  depended 
on  the  particular  noise  model,  were  non-decreasing 
functions  of  the  distance  between  /,  and  g,. 

There  are  some  cases,  however,  when  the  conditional 
probability  distribution  of  the  observations  /*,,/(j;/) 
is  multimocal  (as  a  function  of  /)  which  causes 
the  functions  +,  to  be  non-monotonic,  so  that  the 
solution  to  the  orobiem  remains  ambiguous,  even  if 
the  observations  are  densa  and  the  signal  to  noise 
ratio  arbitrarily  high.  TQjiiustrate  this  situation,  we  will 
study  an  important  instance  of  it:  the  "signal  matching" 
problem,  wnose  one-dimensional  version  is  as  follows: 


Figure  6,  Hybrid  network  implementing  the  surface  ^construc¬ 
tion  algorithm  of  section  A  Ti  e  voltage  at  every  node  represents 
the  height  of  the  surface  Inside  every  rectangular  box  there 
is  a  resistance  of  unit  magnitude  and  a  switch  whose  stale  is 
controlled  by  the  corresponding  line  element,  (see  text/. 


r 

h  n  v 


* 


* 


Consider  two  one-dimensional,  real  valued  sequences 
hL.  h!tl  where  hi.  is  oofained  from  h„  by  shifting  some 
subintervals  according  to  the  disparity  sequence”  d: 

M«)  =  hu{'  +  di) 

with 

di  €  Q  =  {— m.-m  +  1,. . .,-1,0, 1,...,m} 

The  signal  matching  problem  is  to  find  d  given  hi,,hp. 
(In  a  more  lealistic  situation,  we  do  not  observe  A/.,  hn 
directly,  but  rather  some  noise-corrupted  versions 
gi.,git)-  Some  interesting  instances  of  this  problem  are 
the  matching  of  stereoscopic  images  along  epipolar 
lines  (Marr  and  Poggio,  197P);  the  computation  of 
the  dip  angle  of  geological  structures  from  electrical 
resistivity  measurements  taken  along  a  bore  hole,  and 
the  matching  of  DNA  sequences. 

To  make  the  discussion  more  specific,  we  will  consider 
a  simple  example,  i. ■  which  the  sequences  hi.,  ha  are 
binary  Bernoulli  sequences:  we  will  assume  that  the 
noise  corruption  process  can  be  modeled  as  a  binary 
symmetric  channel  with  known  error  rate,  a  id  tha'  - 
is  known  to  be  a  piecewise  constant  function.  A  well 
known  instance  of  this  problem  is  the  matchin-  oi  3 
row  of  a  -andom  dot  stereogram  with  density  p  (Julesz 
(i960)),  when  the  components  of  the  stereo  pair  are 
corrupted  by  noise. 

The  stochastic  model  for  the  observations  is  then 
constructed  by  assuming  that  the  right  image  is  a 
sample  function  of  a  Bernoulli  process  A  with  parameter 
P : 

9/t(  0  =  -4(») 

The  left  image  is  assumed  ic.  oe  formed  from  the  right 
one  by  shifting  it  by  a  variable  amount  given  by  the 


disparity  function  d,  except  at  some  points  where  an 
error  is  commited  with  probability  r.  Note  that  some 
regions  that  appear  in  the  right  image  will  be  occluded 
in  the  left  one  (see  figure  8).  The  "occlusion  indicator" 
can  be  computed  deterministically  from  d  in  the 
following  way: 

,  n  _  ft,  H  <f,-t  >  d;  it, for  some  integer  fc  €  (0,  mj 
*'  to,  otherwise 

(16) 

The  occiuded  areas  are  assumed  to  be  "filled  in"  by 
an  independent  Bernoulli  process  D.  The  final  model 
is  then: 


(gid'  +  <*,-).  wi,h  Prob-  1  ~  *•  i!  M*)  =  0 
<7/.(t)  =  1 1  —  gid'  +  di),  with  prob.  t,  if  =  0 
with  prob.  l,  if  4>JS)  —  1 

(17) 

Note  that  in  the  two-dimensional  case,  the  index  « 
denotes  a  site  of  a  lattice,  and  therefore  it  can  be 
represented  as  a  two-vector  (t|,  t2)  whose  components 
denote  the  column  and  row  of  the  site,  respectively. 
To  simplify  tho  notation,  we  will  adurpt  the  following 
convention  throughout  this  section:  when  a  scalar 
is  added  to  this  vector  index  (as  in  g,t{i  +  d,)  and 
di, t),  it  will  be  implicitly  assumed  that  it  is  multiplied 
by  the  vector  (t.o)  (so  that  the  above  expressions 
should  be  understood  as  9/r(«  +  (*,,<>))  and  <f,+(t0 1, 
respectively).  Using  this  convention,  the  observation 
mode,  of  equation  (17)  can  be  applied  either  to  the 
one  or  to  the  two-dimensional  cases. 

Notice  that  even  if  the  observations  are  noise-free 
(r  =  o)  the  solution  of  the  problem  remat.s  ambiguous, 
and  it  cannot  be  uniquely  determined  unless  some 
prior  knowledge  about  d  (for  example,  in  the  form 
of  a  MRF  model)  is  introduced.  The  use  of  a  MRF 


Figure  7.  (a)  '■  relations  of  3  rectangles  at  heigh'a  2.0,  3.0 

arid  2.0  over  a  ..  kgiound  at  height  1.0  (height  coded  by  grey 
level:  a  white  pixel  means  lhat  the  observation  is  absent  at  ihat 
point),  (b)  Equilibrium  stale  of  (lie  network  with  all  lines  turned 
"oft”,  (c)  Optimal  estimate. 


:  I 

V  W  "*•  I 

V  rvr-  . 


V- 


model  in  the  stereo  matching  case,  corresponds  to  a 
quantification  of  the  assumption  of  the  existence  of 
“dense  solutions”  (this  term  was  introduced  by  Julesz 
(I960),  and  essentially  corresponds  to  the  assumption 
that  the  disparity  d  varies  smoothly  in  most  parts  of  the 
image;  see  also  Marr  and  Poggio  (1979)),  and  the  use 
of  the  occlusion  indicator  corresponds  to  the  "ordering 
constraint"  (i.e.,  the  reauirement  that  if  i  >  j,  then 
»'  +  d,  >  ]  +  a,,  see  Baker  (1931);  we  put  4>d  =  1 
whenever  ihis  constraint  is  violated). 

To  formulate  the  estimation  problem,  we  will  consider 
the  sequence  gt.  as  “observations,"  while  gK  will  play 
the  role  of  a  set  of  parameters.  Thus,  from  (17),  we 
have  (assuming,  for  simplicity  that  p  =  J): 

=  k\d,gH)  =  P:/k)  = 

1  - 1,  if  fa(i)  =  0  and  gn(i  +  d,)  =  k 

r,  if  tpd{i)  =  0  and  s«(t  +  d,)  ^  k 

2  <  if  £<(*)  =  1 

As  a  prior  model  for  the  disparity  field,  we  may  use  a 
first  order  MRF  with  generalized  Ising  potentials,  such 
as  the  one  presented  in  section  5.1.  Other  models 
may  also  be  used,  including  the  coupled  depth  ana 
line  fields  that  we  discussed  in  the  previous  section. 
For  the  present,  let  us  assume  that  the  simpler  ising 
model  is  adequate.  Note  that  even  when  the  matching 
problem  is  one-dimensional  (we  are  asuming  that  there 
is  no  vertical  disparity  bet\  -een  the  images,  so  that  the 
matching  can  be  done  or  a  row-by-row  basis),  the 
two-dimensional  nature  of  he  prior  MRF  model  for  the 
disparity  introduces  a  courling  between  matches  at 
adjacent  rows.  The  posterior  energy  is: 

Viidi  g)  =  £  V(d„d,)  +  '  £  *,(.')  In  2  + 

7u  .J  2  , 

+  f  £0  ~  d>d(i))6(gi.{x)  -  9n('  +  <fi))  {■*) 

*  » 


where 

°  ~  ln(r^r) 

It  is  possible  to  apply  the  general  Monte  Carlo  algo¬ 
rithms  presented  above  to  approximate  the  optimal  es¬ 
timate  d  with  resoect  to  a  given  performance  measure 
(such  as  the  mean  squared  error).  Their  use  in  this 
case,  however,  is  complicated  by  the  introduction  of 
the  occlusion  function  fa  in  the  posterior  energy:  the 
size  of  the  support  for  this  function  equals  the  total 
number  of  allowed  values  for  the  disparity  (see  equation 
(16)).  If  this  number  is  large,  the  computation  of  the 
increment  in  energy,  or  of  the  conditional  distributions 
(if  the  Gibbs  Sampler  is  used)  may  be  quite  expensive. 
In  many  cases,  however,  the  size  of  the  regions  of 
constant  disparity  is  relatively  large  compared  with  the 
size  of  the  occluded  areas.  In  these  cases,  one  can 
approximate  the  posterior  energy  by; 

t//.(c 0  =  ^  £  VK-  dj)  +  5  £  s(gL(i)  - ««(«'  +  4)) 

*°  i.j  *  •' 

and  increase  significantly  the  computational  efficiency. 
It  is  also  possible,  particularly  for  the  high  signal  to  noise 
ratio  case,  to  design  deterministic,  highly  distributed 
algorithms  for  the  efficient  computation  of  the  optimal 
estimator.  The  details  of  these  designs  can  be  found 
in  Marroquin,  1905. 

To  illustrate  the  performance  of  this  approach,  we 
present  in  figure  9  a  random  dot  stereogram  porf-aying 
a  square  floating  over  a  uniform  background  tpanel 
(a)),  and  the  reconstructed  surface  (panel  (b)). 

7.  Parallel  Implementations. 


7.1.  Connection  machine  architectures. 

The  general  Monte  Carlo  procedure  that  we  have 
presented  for  the  approximation  of  the  optimal  Bayesian 


* 


Figure  8.  Occluded  Regions:  The  horizontal  and  vertical  axis 
represent  points  in  one  row  ol  the  lett  and  right  images, 
respectively.  Watching  points  are  represented  by  black  circles. 
Any  match  in  the  shaded  region  will  occlude  the  point  • 


«L 


•©v 


‘V’.v' 


estimators  of  MRF's  can  be  greatly  accelerated  if  it 
is  implemented  in  a  parallel  architecture.  A  neces¬ 
sary  condition  for  the  convergence  of  the  probabil¬ 
ity  measures  of  the  Markov  chains  defined  by  the 
Metropolis  or  Gibbs  Sampler  algorithms  to  the  pos¬ 
terior  Gibbs  distribution  (and  therefore,  for  the  conver¬ 
gence  of  the  approximations  given  by  equations  (13) 
and  (14)  to  the  desired  estimates)  is  that  if  two  sites 
belong  to  the  same  clique,  they  are  never  updated  at 
the  same  time.  It  is  important  to  note,  however,  that 
this  condition  is  also  sufficient  only  for  the  case  of 
the  Gibbs  sampler:  if  one  updates  simultaneously  the 
states  of  all  non-neighboring  sites,  the  reversibility  of 
the  resulting  chain  will  be  destroyed,  so  that  it  w  II 
no  longer  be  possible  *o  guarantee  the  convergence 
of  the  Metropolis  algorithm  to  the  desired  result  (see 
Marroquin,  1985). 

If  one  implements  the  Gibbs  sampler  in  a  parallel 
architecture  in  which  a  processor  is  assigned  to  each 
site,  the  total  execution  time  will  be  reduced  by  a  factor 


K 

where  K  is  the  so  called  “chromatic  number"  of  the 
graph  that  describes  the  neighborhood  structure,  and 
it  is  equal  to  the  minimum  number  of  colors  needed  to 
color  the  sites  of  the  lattice  in  such  a  way  that  no  two 
neighbors  are  the  same. 

An  example  of  such  a  massively  parallel  architecture  is 
the  “Connection  Machine"  (Hillis,  1985),  currently  un¬ 
der  construction  a-  Thinking  Machines  Corp.  and  at  the 
Artificial  Intelligence  laboratory  at  MIT  This  machine 
was  origina.ly  designed  for  the  parallel  processing 
of  structured  symbolic  expressions,  such  as  frames 
and  semantic  networks.  It  is  a  “Single  Instruction 
Multiple  Data”  (SIMD)  array  processor  consisting  of 
256,000  processing  units  (each  with  a  single  b.t 
Arithmetic/l.ogical  unit,  and  about  4K  bits  of  storage) 
organized  in  a  four-connected  lattice  that  is  512  ele¬ 
ments  square.  Besides  this  nearest-neighbor  connec¬ 
tivity.  it  will  also  be  possible  (although  computationally 
more  expensive),  to  connect  c.y  two  processors  in  the 


array  using  a  “Cross  Omega”  router  network  (Knight, 
in  Winston,  1984). 

At  each  cycle  of  the  machine,  fo.  which  we  wiil  assume  a 
duration  of  one  microsecond,  an  instruction  is  executed 
by  each  processor,  and  a  single  bit  is  transmitted  tc 
its  neighbors.  This  means  that  the  updating  scheme 
can  be  implemented  mos*  efficiently  if  tne  field  is  first 
order  Markov,  but  higher  oiler  processes  can  also  be 
implemented  without  using  tho  router  by  successively 
propagating  the  transmited  state  (the  execution  time, 
therefore,  will  grow  linearly  with  the  order  of  the  field). 

To  make  this  discussion  more  concrete,  consider,  as 
an  example,  the  problem  of  finding  the  optimal  estimate 
for  an  M-ary,  first  order  MRF  with  Icing  potentials  (i.e., 
the  segmentation  of  a  piecewise  constant  image)  from 
noisy  observations.  Let  us  assume  that  the  estimator  is 
to  be  implemented  in  the  “Connection  Machine”,  and 
suppose  that  by  the  use  of  appropriate  scaling  factors, 
a'l  the  numbers  can  be  represented  as  16-bit  integers. 
We  will  use  the  following  conservative  assumptions: 
We  assume  that  16  cycles  of  a  single  1-bit  processor 
are  needed  to  perform  16-hit  addition,  substraction 
or  comparison:  l(*-  cycles  to  perform  multiplication 
or  division;  2  x  16-  cycles  for  generating  a  pseudo¬ 
random  number  with  uniform  distribution  on  a  given 
interval;  lu  cycles  for  memory  transfer  operations,  and 
6  x  16*  cycles  for  computing  an  exponential. 

Assuming  that  we  r  jn  250  iterations  of  the  system,  and 
ignoring  the  overhead  time  we  get  that 

Gxec.  Time  1.4  [M  -  1)  seconds 

For  the  particular  casa  of  binary  images,  we  have 
developed  a  deterministic  scheme  fer  which  this  ex¬ 
ecution  time  can  be  reduced  by  an  order  of  magnitude 
(see  Marroquin,  1985). 

In  the  case  of  the  reconstruction  of  piecewise  smooth 
functions  from  sparse  data,  the  optimal  estimator 
can  also  be  implemented  in  this  machine.  To  study 
this  implementation,  we  first  note  that  the  chromatic 
numbers  of  the  graphs  associated  with  tho  line  and 
depth  neighborhood  systems  are  4  and  2,  respectively, 


Figure  9.  (a)  Random  dot  stereogram,  (b)  Reconstructed 
surface. 


■ 


306 


which  means  that  the  coupled  process  has  a  chromatic 
number  cf  6.  In  figure  10  (a)  we  illustrate  o.  e  possible 
“coloring”. 

The  colors  of  the  line  process  are  represented  oy  the 
numbers  12, 3, 4,  and  those  of  the  depth  process  by 
white  and  black  circles.  The  updating,  process  can  be 
implemented  in  a  4-connected  architecture  such  as  the 
“Connection  Machine”,  by  assigning  one  processor  to 
each  depth  site  and  its  four  adjacent  line  elements.  We 
will  thus  nave  two  different  populations  of  processors, 
whose  configurations  are  shown  in  figures  9  (b)  and 
(c),  respectively. 

Each  complete  iteration  consist  on  6  major  cycles:  in 
the  first  two,  the  state  of  the  white  and  black  depth 
variables  is  respectively  updated,  and  in  the  next  four, 
the  new  states  of  the  binary  line  variables  stored  in  (cay) 
the  white  processors  are  successively  computed  arid 
transmitted  to  the  corresponding  memory  locations  uf 
the  neighboring  black  processors.  Note  tnat  in  this 
scheme  we  have  some  redundancy  in  the  use  of 
memory  (each  binary  variable  is  stored  twice),  but 
the  state  of  all  the  elements  needed  for  each  updating 
operation  is  always  available  from  adjacent  processors. 
Considering  that  the  Monte  Carlo  algorithm  requires 
about  200  iterations  to  converge,  we  estimate  in  this 
caso  an  execution  time  of  approximately  2.5  seconds, 
independently  of  the  lattice  size.  As  before,  we  have 
also  developed  in  this  case  a  deteiministic  scheme 
with  very  good  experimental  performance,  for  which 
the  execution  time  can  be  reduced  bv  at  least  an  order 
of  magnitude. 

7.2.  Hybrid  analog-digital  computers  and  Hopfield 
networks 

as.  •«?  mentioned  in  section  4.2.1,  the  r.  construction 


of  piecewise  continuous  functions  can  be  achieved 
ny  coupling  two  MHFs,  one  corresponding  to  the 
conhnuous  field  and  the  other  to  the  discontinuities. 
From  this  scheme  we  have  suggested  a  special  purpose 
parallel  computer  consisting  of  an  analog  network  of 
resistances  -  corresponding  to  the  continuous  intensity 
field  -  and  a  digital  network  -  corresponding  to  the 
line  process,  coupled  via  D  A  and  A-D  converters.  The 
idea  suggested  by  computer  experiments  (Marroquin, 
1985)  is  that  the  two  processes  can  run  on  different 
time  scales,  a  slow  one  for  the  digital  part  and  a  fast  one 
for  the  analog  network.  In  this  way  the  two  processes 
are  effectively  decoup’ed  and  the  continous  fUd  finds 
its  equilibrium  effectively  instantaneously  after  each 
update  of  the  line  process.  Koch,  Marroquin  and  Yuille 
et  al.  (1985)  discuss  implementations  of  this  idea.  This 
idea  can  be  extended  to  multilayered  hybrid  networks, 
each  layer  corresponding  to  a  MRF  and  being  digital  or 
analog  depending  on  the  continous  or  binary  nature  of 
the  field.  Hybrid  multilayered  architectui  as  of  this  type 
are  especially  attractive  for  implementing  the  fusion  of 
several  vision  processes. 

Finally,  we  mention  that  Koch  et  al.  (1985)  have 
been  expcrirnentii.g  successfully  with  a  special  type 
of  analog  networks  -  Hopfield  networks  -  whose 
equilibrium  states  correspond  t  j  approximations  of  the 
optimal  estimators. 

8.  Conclusions. 

In  this  paper  we  have  presented  a  probabilistic  ap¬ 
proach  to  the  solution  of  a  class  of  perceptual  prob¬ 
lems.  We  showed  that  these  problems  can  be  reduced 
to  the  recontruction  of  a  function  on  a  finite  lattice 
irom  a  set  of  degraded  observations,  and  derived  the 
Bayesian  estimators  that  provide  an  optimal  solution. 


Figure  10.  (a)  Coloring  o!  the  coupled  line-depth  lattice,  (b) 
and  (c)  Elements  ivhose  stale  is  sto.f.d  in  each  of  tho  two  types 
d  processors  o(  a  4-con  necfeci  parallel  architecture. 

•  10  2*10  3 

3  4  3  4  10  2 


•  10  2*10 


4 


3  4  3  4 


0  2  *10  2  ' 

(a) 


z  a  i 

3 

(C) 


»*  • 


Vo  o*%v 


V*  * 


Or 


307 


We  have  also  developed  efficient  distributed  algorithms 
for  the  computation  of  these  estimates,  and  discussed 
their  implementation  in  different  kinds  of  hardware. 
To  demonstrate  the  generality  and  practical  vaiue  of 
this  approach,  we  studied  in  detail  several  applica¬ 
tions:  the  segmentation  of  noise-corrupted  images; 
the  recontruction  ot  piecewise  smooth  surfaces  from 
sparse  data  and  the  reconstruction  of  depth  from 
stereoscopic  measurements. 

8.1.  Connection  with  Standard  regularization 

The  maximum  a  posteriori  (MAP)  estimate  of  a  MRF 
is  obviously  similar  to  a  variational  principle  of  the 
general  form  of  equation  (3).  since  the  use  of  this 
criterion  defines  the  optimal  estimator  as  the  global 
minimizer  of  the  posterior  energy  Vr  (equation  6): 
the  first  term  measures  the  discrepancy  between  the 
data  and  the  solution,  the  second  term  is  now  an 
arbitrary  "potential"  function  of  the  solution  (defined 
on  a  discrete  lattice).  It  is  then  natural  to  ask  for  the 
connection  between  standard  regularization  principles 
and  the  MRF  approach.  It  turns  out  that  a  MAP  esimate 
leads  to  the  minimization  of  a  functional  Up  -  in 
general  not  quadratic  -  that  reduces  to  a  quadratic 
functional,  of  the  standard  regularization  type,  when 
the  MRF  is  continuous-valued,  the  noise  is  additive 
and  gaussian  (the  term  2^  <£,(/,  ry,)  will  be  quadratic) 
and  first  order  differences  of  the  field  are  zero-mean, 
independent,  gaussian  random  variables  (thus  the  a 
priori  probability  distribution  is  a  Gibbs  distribution  with 
quadratic  potentials  so  that  the  ten®  the  term  U0(f)  >3 
quadratic). 

8.2.  The  Fusion  problem 

This  approach  also  permits,  in  principle,  the  incorpora¬ 
tion  of  more  than  one  modality  of  observations  into  a 
single  estimation  process,  as  well  as  the  simultaneous 
estimation  of  several  related  functions  from  the  same 
data  set.  This  makes  one  hope  that  tnis  framework 
could  be  useful  in  the  solution  of  difficult  problems  that 
require  such  an  integrated  approach. 

For  instance,  the  stereo  matching  problem  in  real 
situations  has  not  been  solved  yet  in  a  completely 
satisfactory  way.  The  same  can  oe  said  of  other 
related  perceptual  problems  such  as'  edge  detection; 
image  segmentation;  the  recovery  of  the  shape  of 
an  object  from  a  single  two-dimensional  image  (the 
"shape  form  shading"  problem),  and  the  segmentation 
of  a  scene  into  distinct  objects,  as  well  as  the  recovery 
of  their  three-dimensional  structure  from  the  analysis 
of  images  formed  at  successive  instants  oi  time  (the 
“structure  from  motion"  problem).  All  these  problems 
are  obviously  related,  and  it  is  intuitively  clear  that 
the  individual  solutions  that  can  bo  obtained  should 
improve  if  the  mutual  constraints  that  the  solution 
of  each  individual  problem  imposes  on  the  others 


were  taken  into  account.  Thus,  the  presence  of  a 
brightness  edge  should  increase  the  likelihood  of  a 
depth  edge,  and  viceversa;  the  depth  estimated  from 
stereo  should  be  compatible  with  the  shape  derived 
from  shading;  points  belonging  to  the  same  region  in 
an  image  should  move  together,  etc.  We  believe  that 
these  constraints  can  be  incorporated  in  the  potential 
functions  of  the  corresponding  MRF  models,  so  that 
the  combined  optimal  estimation  process  represents, 
in  fact,  an  integrated  cooperative  solution  to  these 
proolems,  with,  hopefully,  a  significantly  improved 
performance. 

READING  LIST 


Abend,  K.  "Compound  decision  procedures  for  un¬ 
known  distributions  and  for  dependent  states  of  na¬ 
ture,”  in  Pattern  Recognition,  L.  Kanal,  ed.,  Thompson 
Book  Co.  Washington,  D.C.  (1968). 

Barrow,  H.G.  and  Tennenbaum,  J.M.  "Interpreting  line 
drawings  as  three  dimensional  surfaces,"  Artificial 
Intelligence,  17,  1981. 

Brady,  J.M.  Computing  Surveys,  14,  (1982). 

Brown,  C.M.  Science,  224,  (1984) 

Besag,  J.  "Spatial  interaction  and  the  statistical  analysis 
of  lattice  systems,”J.  Royal  Rtat.  Soc.  B  34  75-83 
(1972). 

Cohen,  F.S.,  and  D.  B.  Copper.  "Simple  parallel 
hierarchical  and  relaxation  algorithms  for  segmenting 
noncausal  Markovian  random  fields,"  Brown  University 
Laboratory  for  Engineering  Man/Machine  Systems, 
Tech.  Report  LEMS-7  (1934). 

Cross,  G.C.  and  A.  K.  Jain.  "Markcv  random  field 
texture  models,"  IEEE  Trans.  Pattern  Analysis  and 
Machine  Intelligence,  5  (1983). 

Elliot.  H„  R.  Derin,  R.  Christi,  and  D.  Geman,  "Application 
of  the  Gibbs  distribution  to  image  segmentation,"  Univ. 
of  Massachusetts  Technical  Report  (1983). 

Feller,  W.  An  introduction  to  probability  theory 
and  its  applications,  Vol  I.  John  Wiley  and  Sons, 
New  York  (1950). 

Gallager,  R.  G.  Information  theory  and  reliable 
communication  John  Wiley  and  Sons,  New  York 
(1968). 

Geman,  S.  and  D.  Geman.  “Stochastic  relaxation, 
Gibbs  distribution,  and  the  Bayesian  restoration  ol 
images,”  IEEE  Trans.  Pattern  Analysis  and  Machine 
Intelligence,  6,  (1984). 

Grenander,  U.  “Tutorial  in  Pattern  Theory,"  Div.  of 
Applied  Math.  Brown  University  (1984). 

Crimson,  W.E.L.  From  Images  to  Surfaces  MIT  Press, 
Cambridge,  Mass.  (1931). 


303 


Grimson,  W.E.L.  “A  computational  theory  of  visual 
surface  interpolation,”  Phil.  Trans.  R.  Soc.  London  B, 
298,  (1982a). 

Habibi,  A.  “Two  dimensional  Bayesian  estimation  ol 
images,”  Proc.  IEEE  60,  (1972). 

Hansen,  A.R.  and  H.  Elliot.  “Image  segmentation  using 
simple  Markov  field  models,”  Comp.  Vision,  Graphics, 
and  Image  Proc.  20,  (1982). 

Hassner,  M.  and  J.  Sklansky.  "The  use  of  Markov 
random  fields  as  models  of  textuie,”  Comp.  Vision, 
Graphics  and  ‘mage  Proc.  12,  (1980). 

Hillis,  D.  “The  Connection  Machine."  M.l.T.  Department 
of  Electrical  Engineering  and  Computer  Science  Ph.D. 
Thesis  (1985). 

Julesz,  B.  "Binocular  depth  perception  of  computer 
generated  patterns,"  Bell  Sys.  Tech.  J.  39  (I960). 

Kashyap,  R.L.,  and  R.  Cheltappa.  “Estimation  and 
choice  of  neighbors  in  spatial  interaction  models  of 
images,"  IEEE  Trans,  or,  Into.  Theory  29  (1983). 

Kemeny,  J.G.,  and  J.  L.  Snell.  Finite  Markov  Chains 
Van  Nostrand,  New  York  (1960). 

Kirkpatrick,  S.,  C.  D.  Gelatt,  and  M.  P.  Vecchi,  “Optimization 
by  simulated  annealing, "Science  220  (1983). 

Koch.  C..  J.  Marroquin,  and  A.  Yuille.  "Analog  ‘neuronal’ 
networks  in  early  vision,"  Massachusetts  Institute  of 
Technology  Artificial  Intelligence  Laboratroy  Memo  751 
(1985). 

Marr,  D.  Vision,  A  computational  investigation  Into 
the  human  representation  and  processing  ot  visual 
information,  W.  H.  Freeman  &  Co.,  San  Francisco, 
1982. 

Marr,  D.  and  T.  Poggio.  "From  understanding  com¬ 
putation  to  understanding  neural  circuitry,"  Neur.  Res. 

Bull.  15(1977). 

Marroquin.  J.  "Surface  reconstruction  preserving  dis¬ 
continuities,"  Massachusetts  Institute  of  Technology 
Artificial  Intelligence  Laboratory  Memo  792  (1984). 

Marroquin,  J.  "Probabilistic  solution  of  inverse  prob¬ 
lems,"  Ph.u.  Thesis,  Massachusetts  Institute  of  Technology 
(1985). 

Metropolis,  N.  et  al.  “Equation  of  state  calculations  by 
fast  computing  machines, "J.  Phys.  Chem.  21  (1953). 

Morozov,  V.A  Methods  for  Solving  Incorrectly 
Posed  Problems,  Springer-Verlag,  New  York,  1984. 

Nahi,  N.E.  and  T.  Assefi.  “Bayesian  recursive  image 
estimation,”  IEEE  Trans,  on  Computers  21  (1972). 

Oster,  G.F.,  A.  Perelson  and  A.  Katchalsky.  "Network 
Thermodynamics."  Nature  234  (1971). 

Poggio.  T.  “Vision  by  man  and  machine,"  M.  I.  T. 
Artificial  Intelligence  Laboratory  Memo  776  (1984). 


Poggio,  T.  and  C.  Koch.  “Analog  networks:  a  new 
approach  to  neural  computation,"  M.  I.  T.  Artificial 
Intelligence  Laboratory  Memo  783.  (1984). 

Poggio,  T.  and  V.  Torre.  "Ill-posed  problems  and 
regularization  analysis  in  early  vision,"  M.  I.  T.  Artificial 
Intelligence  Laboratory  Memo  773  (1984). 

Poggio,  T.,  V.  Torre,  and  C.  Koch.  "Computational 
vision  and  regularization  theory,"  Nature  317  (1985). 

Poggio,  T.,  H.  Voorhees,  a nr4  A.  Yuille.  “Regularising 
edge  detection,"  M.  I.  T.  Artificial  Intelligence  Laboratory 
Memo  776  (1984). 

Terzoupulos,  D.  “Muftiresolution  computation  of  visible- 
surface  representations."  Ph.  D.  Thesis  M.!.T.  Department 
of  Electrical  Engineering  and  Computer  Science  (1 994). 

Terzoupulos,  D.  “Integrating  visual  information  for  mul¬ 
tiple  sources  for  the  cooperative  computation  of  sur¬ 
face  shape,”  to  appear  in  From  Pixels  to  Predicates: 
Recent  Advances  in  Computational  and  Robotic 
Vision,  ed.  A.  Pentland,  Abtex,  (1985). 

Tiknonov,  A.N.,  and  V.  Y.  Arsenin.  Solutions  of 
Ill-Posed  Problems,  Winston  and  Sons,  New  York 
(1977). 

Winston,  P.  "Proposal  to  DARPA,"  M.I.T.  (1984). 

Wong,  E  "Two-dimensional  tandom  fields  and  the 
representation  of  images,"  SIAM  J.  A  op.  Math.  16,  4 
(1968). 

Woods,  J.  W.  "Two-dimensional  discrete  Markovian 
fields, "IEEE  Trans.  Info.  Theory  18  (1972). 


309 


GCC^cm 


Stereo  Verification  in  Aerial  Image  Analysis 

David  M.  McKeown,  Clifford  A.  McVay, 
and  3ruce  D.  Lucas 

Department  of  Computer  Science 
Carnegie-Meilon  University 
Pittsburgh,  Pa.  152?£ 


Abstract 

'this  paper  describes  a  flexible  stereo  verification  system.  STntEOSYS, 
and  its  application  to  the  analysis  of  high  resolution  aerial 
photography.  S'crco  verification  refers  to  the  verification  of 
hypotheses  about  a  scene  by  stereo  analysis  of  the  scene.  Unlike  stereo 
interpretation,  stereo  verification  requires  only  coarse  indication*  of 
ihrcc-difncnsjonaj  structure,  in  the  erst  of  aerial  photography,  this 
means  coarse  indications  of  the  heights  of  objects  above  their 
surroundings.  This  requirement,  together  with  requirements  for 
robustness  and  for  dense  height  measurements,  shape  the  decision 
about  the  stereo  system  to  use.  This  paper  discusses  these  design  issues 
and  details  the  results  of  an  implementation. 


Subject  Terras:  Computer  Vision.  Stereo  Analysis.  Stereo 
Verification.  Aerial  Photo  Interpretation,  Artificial  Intelligence, 
Kik,  ’  -dgc  Based  Interpretation,  Image/Map  Databases 


1.  Introduction 

This  paper  describes  a  flexible  stereo  verification  system,  stereos  ys, 
and  its  application  to  the  analysis  of  high  resolution  aerial 
photography.  Stereo  verification  refen  to  the  verification  of 
hypotheses  about  a  scene  by  stereo  analysis  of  the  scene.  Unlike  stereo 
interpretation,  stereo  verification  require?  only  coarse  indications  of 
three-dimensional  structure.  In  the  case  of  aerial  photography,  this 
means  coarse  indications  of  the  heights  of  objects  above  their 
surroundings.  This  requirement,  together  with  requirements  for 
robustness  and  for  dense  height  measurements,  have  stuped  the 
decision  about  the  stereo  system  to  tie. 

In  this  research  we  have  attempted  to  address  stereo  analysis  in  a 
very  unconstrained  environment  Rather  than  simply  focusing  on 
isolated  image  analyse  where  stereo  pairs  arc  carefully  controlled,  we 
have  constructed  a  system  that  can  automatically  perform  matching 
and  analysis  using  arbitrarily  selected  images.  We  arc  motivated  by  the 
observation  that  if  knowledge-based  image  understanding  systems  are 
to  begin  to  perform  analysis  tasks  at  a  level  of  pcrfotmar-cc  required 
for  mapping  and  photo  interpretation,  they  must  be  able  to 
accommodate  a  much  broader  range  of  task  >mcertaintv  and 
complexity  than  has  been  previously  demonstrated  in  any  research  or 
development  system 

Stereo  verification  deals  with  a  variety  of  problems  that  are  not 
ordinarily  present  in  isolated  experiments  with  stereo  matching  and 
analysis  Some  of  the  most  inicresung  problems  involve: 

•  The  selection  of  an  appropriate  conjugate  image  pair  from 
a  database  of  overlapping  images  based  on  criteria  that 
would  maximize  the  likelihood  for  good  correspondence. 

•  The  image  pairs  must  be  dynamically  resampled  such  that 
the  cpipolar  assumption  (ie..  cpipolars  arc  scan  lines)  used 
in  most  region-based  stereo  matching  algorithms  can  be 


applied. 

•  Because  (he  size  of  the  areas  to  be  matched  varies  greatly, 
the  system  design  mu' .  be  flexible  and  general 

•  An  initial  coarse  registration  step  is  generally  necessary 
because  the  quality  of  the  correspondence  between 
conjug 'tc  pairs  varies  greatly.  In  many  cases  the  magnitude 
of  the  Jiitial  misregistration  is  greater  than  the  expected 
disparity  shift 

•  The  system  must  analyze  the  stereo  results  and  generate  a 
symbolic  description  that  provides  an  estimate  of  the  actual 
height  of  the  regien  in  question,  and  the  confidence  of  that 
estimate.  The  computation  of  a  depth  map  (disparity  map) 
is  nor  a  sufficient  final  resu  t 

These  requirements,  in  turn,  raise  a  set  of  broader  research  issues: 

T.  How  can  an  aerial  mage  database  be  used  to  automatically 
generate  a  useful  stereo  pair  containing  an  arbitrary 
repon? 

1  How  can  a  stereo  system  run  die  the  misregistration 
protlems  inherent  in  variable  sourced  image  databases? 

3.  What  kind  of  stereo  results  are  appropriate  for  use  in  a 
verification  process? 

4.  How  can  stereo  results  be  analyzed  so  as  to  reflect  not  only 
the  presence  (or  absence)  of  height  but  also  the  inherent 
reliability  of  the  results? 

5.  How  useful  is  rccrcc  verification  within  a  knowledge-based 
image  interpretation  system?  What  constraints  does  it 
provide,  and  how  important  is  stereo  information  in 
producing  accurate  scene  descriptions? 

The  results  of  this  research  indicate  that  image/map  database  issues 
in  stereo  verification  influence  the  utility  of  such  an  approach  as  much 
as  the  underlying  sictco  matching  algorithm.  In  fact,  they  are 
intimately  related.  The  ability  to  be  flexible  in  the  selection  of  stereo 
pairs  provides  opportunities  for  multi-temporal,  multi-scale,  or  multi¬ 
look  matching.  Equally  as  important  is  flexibility  in  the  matching 
algorithm,  especially  with  resp.ct  to  assumption  ihat  require  nearly 
perfectly  aligned  conjugate  images,  a  situation  that  is  unlikely  to  occur 
in  outside  of  the  laboratory. 

We  believe  that  the  ability  to  dynamically  select  conjugate  image 
pain  from  a  database  based  upon  the  region  of  interest  and  knowledge 
of  the  requirements  of  the  matching  algorithm  is  required  for  a  fully 
automated  image  analysis  system.  Our  results  also  indicate  that  stereo 
analysis  can  function  as  a  very  powerful  discriminator  in  an  image 
understanding  system  without  having  to  perform  3D  shape 
reconstruction.  That  is,  course  estimates  of  height,  coupled  will: 
confidence  in  those  estimates,  can  greatly  constrain  search  during 
image  interpretation. 


This  paper  discusses  these  broader  research  issues  as  well  as 
providing  the  reader  with  an  analysis  of  th  •  results  of  our 
crucrimci.tation  and  deta'1-.  of  the  actual  implementation. 


2.  Stereo  Verification  in  spam 

STERi'.Oj rs  was  developed  as  a  knowledge  source  for  spam1  '  a 
rule-based  system  that  uses  knowledge  from  a  variety  of  sources  to 
interpret  airport  scenes  in  aerial  imagery.  Many  of  the  requirements 
for  flexibility  in  a  stereo  system  arise  directly  from  the  fact  that 
STEREOSYS  must  interact  in  a  larger  context,  that  of  the  image 
understanding  system.  As  we  move  from  isolated  corputcr  vision 
experiments  to  system  integration,  the  performance  of  particular 
components  must  be  evaluated  within  the  constraints  and  context  of 
the  overall  system.  SPAM  manages  and  invokes  various  specialised 
low-level  image  analysis  processes  that  allow  it  to  gather  informatics 
about  regions  In  the  image.  Ihcsc  processes  include  texture  analysis, 
feature  alignment  and  grouping3,  and  depth  cue  generation,  spam  has 
developed  along  two  lines: 

•  The  addition  and  refinement  of  knowledge  about  airports 
and  procedures  for  recognition  and  matching  of  image* 
based  descriptions  to  the  airport  scene  model. 

•  The  addition  and  refinement  of  low-level  image  processes 
that  support  ihe  spam  control  structures  by  providing 
primitive  intermediate-level  scene  description! 

STEREOSYS  falls  into  'he  latter  category  as  it  uses  stereo  to  generate  a 
depth  map  (disparity  image)  description  given  a  hypothesis  region  in 
the  image.  The  role  of  STEXEOSYS  in  the  overall  system  is  to  verify 
hypotheses  such  as  terminal  building,  access  road,  tarmac,  parking 
apron,  and  hangar  by  measuring  the  amount  of  disparity  within  a 
hypothesis  region  and  thereby  estimating  the  likelihood  that  the  region 
is  above  or  at  me  ground  plane.  Further,  if  the  region  is  deemed  to  be 
mostly  a  bo  -e  the  ground,  stereosys  provides  a  coarse  estimate  of  the 
absolute  neight  above  the  ground.  One  may  contrast  this  with 
methods  for  stereo  reconstruction  that  use  feature  matching  or 
segment-based  techniques:  STEREOSYS  does  not  attempt  to  construct  a 
Piccisc  three-dimensional  model  of  the  feature  within  the  scene.  For 
the  tasks  that  spam  requires,  for  example,  the  verification  of  a  hangar 
hypothesis,  it  is  not  as  important  to  determine  the  shape  of  the  roof  as 
much  as  to  reliably  determine  whether  a  roof  of  some  type  is  present 
The  issue  of  robustness  and  reliability  in  aerial  image  interpretation  is 
of  principal  importance  since  most  of  the  hypotheses  generated  by  the 
system  will  not  correspond  to  features  in  the  scene  having  significant 
height  Therefore,  the  ability  to  refute  incorrect  hypotheses  such  as 
hangar  and  terminal  building  by  determining  there  is  no  apparent 
height  as  well  as  to  reliably  confirm  'no  height'  hypotheses  in  areas 
such  as  tarmac  and  parking  aprons  puts  performance  expectations  on 
the  stereo  system  that  transcend  simple  stereo  matching 

SPAM  invokes  SU.KEOSYS  as  a  result  of  recognising  one  of  tvo 
situations.  First  as  a  part  of  low-level  information  gathering,  we  might 
want  to  test  every  region  generated  by  ihe  segmentation  system' 
having  certain  shape  *nd  size  properties  to  determine  whether  it  has 
significant  height  above  the  ground  plane.  Second,  as  a  partof  high- 
level  disambiguation,  there  arc  a  variety  of  eases  where  spatial 
constraints  derived  from  the  rule-based  airport  model  arc  unable  to 
distinguish  between  two  completing  hypothecs.  For  example,  assume 
sp\M  has  found  a  conflict  between  two  interpretations,  "terminal 
building"  and  "parking  loi".  Spatial  knowledge  would  allow  these 
hypotheses  to  occupy  similar  spots  in  the  overall  scene  for  a  wide 
variety  of  airports  and.  therefore,  would  not  be  able  alone  to  resolve 
ihe  conflict.  Another  common  example  arc  compact  two-dimensional 
regions,  such  as  runup  pads  and  the  roofs  of  maintenance  building! 
Shape  and  size  metrics  such  ns  compactness  and  area  provide  only 
weak  cues  in  this  situation,  spam  specifically  recognizes  situations 
•  here  competing  hypotheses  involve  features  that  can  be 
disambiguated  based  upon  knowledge  of  their  height  relative  to  their 
surroundings.  Since  we  may  often  be  looking  at  regions  that  are 


Figure  2-1:  Stereo  Verification 


primarily  at  the  ground  plane,  the  ability  to  reliably  determine  that 
there  is  no  apparent  height  difference  between  th:  region  and  its 
neighborhood  is  equally  important 

In  either  case,  the  stereo  verification  process  can  be  characterized  at 
follows: 

1.  Given  a  region  XI  within  a  geographic  area  Al  from  image 
H.  find  an  appropriate  second  image  fl  that  contains  a 
geographic  area  Al  that  is  the  same  as  geographic  area  A L 
Stereosys  has  access  to  a  database  of  images  through 
primitives  provided  by  the  MAPS  system*  *. 

2.  Image  fragments  ,41  and  Al  'it  rectified  (warped)  and 
registered  (sh  ftcd/ rotated)  into  a  stereo  pair  of  overlaying 
geographic  rectangle*  IH  and  WL 

3.  The  W)  W2  stereo  pair  is  processed  and  (he  result  it 
analyzed  in  order  to  compute  confidence  value*  that 
measure  the  height  of  XI  relative  to  its  surroundings  along 
with  the  system's  overall  confidence  in  the  stereo  result 

In  the  remainder  of  this  paper  we  will  discuss  the  stereo  matching 
algorithm,  how  stereosys  uses  this  algorithm  to  perform  stereo 
verification,  and  some  experimental  results  that  illustrate  the  strength 
of  this  technique  as  well  as  some  of  the  more  interesting  pragmatic 
problems  encountered  in  complei  aerial  imagery.  Section  3  describe* 
the  basic  stereo  m-  ig  process  u:  :ii  by  STEP  EOS  YS.  Section' 4 
describes  the  inteu  n  and  communication  between  the  image 
analysis  system,  spam,  the  image/map  database  system,  maps,  and  the 
stereo  verification  system,  stereosys.  Section  4  also  gives  the 
sequence  of  steps  necessary  to  apply  th:  stereo  algorithm  to  an 
arbitrarily  selected  region  of  an  image.  Section  5  shows  example*  of 
preliminary  experiments  with  si:  the  effects  of  good  and  poor  initial 
correspondence  estimates  the  effect  of  the  fine  registration  step  on  the 
subsequent  matching,  and  the  evaluation  of  STEREOSYS  over  many  test 
regions.  Section  6  overviews  the  strengths  and  limitations  of  this  work, 
and  suggests  future  research  direction!  : 

o.  The  Stereo  Process 

.viekeosys  uses  a  stereo  matching  program,  Si,  described  in  detail 
elsewhere7.  !n  this  Section  we  will  review  this  stereo  matching 
algorithm.  SI  produces  a  disparity  image  (map)  that  is  registered  to  the 
Left  stereo  pair  inage  and  whose  pixel  values  indicate  the  film  plane 
displacement  of  matched  points  in  the  stereo  pair.  Ihe  disparity  value 
is  in  one-to-one  correspondence  with  distance,  or  depth,  from  the 
camera  and  therefore  indicates  relative  height  in  vertical  acriai 


r 


i 


¥ 


311 


.  -  Z  „-?r-  SKj-Tjl  V 


’»  V>“w"  WvT* V  • rw  •  1  , »  '*  '  7/^*1/*/' *  •  -  * 


;*^i.v**5ftxA*w-*9i  vr;> 


w  ffAVAk'i  i*  raA'we»‘ 


photography.  The  process,  in  effect,  correlates  neighborhoods  a  bos' 
every  pi*»l,  but  uses  the  method  of  differences  to  avoid  costly 
exhaustive  searches. 


Urge,  we  must  start  with  relatively  smoothed  images,  For  example, 
some  of  oui  images  require  an  adjustment  on  the  order  of  15  pixels 
between  the  initial  aisparity  estimate  and  the  actual  disparity,  and  sc 
STEREOSYS  begins  with  32  by  32  smoothing  windows. 


3.1 .  Method  of  Differences 

Let  /  (x.y)  and  I Xx.y)  denote  the  two  images  of  a  stereo  pair,  and  let 
h (x.y)  ticnotc  the  dispari.y  map.  Then  the  values  of  the  disparity  map 
are  a  statement  that  the  point  (.x.y)  in  ^  matches  the  point 
(x  +  H.x.y).y't  in ;  that  is  that 

ii(xy)=i1(x  +  Hxj)j) 

Lei  fiu.y)  denote  the  correct  disparity  map.  The  process  begins  with  a 
uniform  disparity  map  hjr.y),  and  successively  updates  the  disparity 
map,  yielding  n .  h  .  etc.  Ideally,  as  successive  refinements  proceed, 
hk-+f l  1 

Consider  a  point  (x,y)  in  the  left  image  of  the  stereo  pair;  the 
difference  fl(x,y)-  A0(x,y)  between  the  correct  disparity  value  and  our 

initial  estimate  is  the  amount  by  which  the  stereo  process  must  correct 
the  disparity  in  going  from  h  to  h.  Ini aally  this  difference  will  be 
relatively  large  because  the  uniform  disparity  estimate  is  not 
particularly  accurate.  Because  of  this,  the  method  of  difference* 
requires  that  we  start  out  with  smoothed  images  to  accommodate  these 
large  differences.  As  the  disparity  estimate  hk  improves  wc  can  use 
less  smoothed  images  because  the  enor  between  hk  and  »  decreases. 

Suppose  wc  have  computed  a  disparity  map  h  •  that  is,  we  estimate 
that  the  point  (x.y)  in  /  mat  *  es  the  point  (x+ h Ax jM  a  L  To 
compute  h.  ,  we  wish  to  adjust  tire  disparity  at  each  point  (jqy)  by  an 
amount  4(*xy)  so  that  the  difference  between  the  images  is  made  as 
small  as  possible,  that  is 

ll(x,y)-U,(x  +  ht(xj)  +  i(xy)y)) 

is  minimized.  Minimizing  this  quantity  directly  involves  a  costly 
search  over  the  possible  values  of  4.  Instead,  the  method  of 
differences  estimates  this  quantity  by  using  derivatives: 


3.2.  Soma  Pragmatic  I'  su-rs  in  Stereo  Matching 

Si  is  also  capable  of  computing  ’  global  registration  shift  bobveen  a 
stereo  image  pair,  also  by  the  method  of  differences.  ’  .iat  is,  a  glooal 
offset  can  be  obtained  that  indicates  how  much  one  image  is  translated, 
or  shifted,  relative  to  the  other.  This  capability  can  often  salvage  the 
analysis  of  misregistered  nercu  pairs  and  is  very  attractive  for  use  with 
spam  since  the  underlying  maps  database  does  not  have  the  image 
control  necessary  to  guarantee  accurately  registered  stereo  pair*. 

Si  does  not  involve  the  use  of  sensitive  future  extraction  thresholds. 
Stereo  matching  in  Si  is  accomplished  for  every  pixel  and  is  not 
restricted  to  selected  image  features  such  as  interesting  areas®,  edge* 
9-10  or  other  extracted  features11.  Limiting  a  stereo  procedure  to 
matching  extracted  image  features  luakes  the  process  sensitive  to  the 
extraction  technique  and  its  associated  thresholds.  Since  SPAM  will  be 
using  a  stereo  process  over  a  wiJe  range  of  images  and  regions,  such 
extraction  thresholds  should  be  avoided  wherever  possible. 

Another  issue  in  the  selection  of  SI  for  use  by  spam  has  to  do  with 
the  f«ct  that  spam  is  not  using  stereo  to  rreognize  objects  or  build 
conceptual  models  from  the  stereo  results.  SPoM  simply  wants  to  know 
if  the  region  of  interest  has  height  relative  to  its  surroundings.  A  dense 
disparity  image  registered  to  the  image  containing  the  region  of 
interest  is  an  ideal  sou  me  of  data  for  (he  analysis  necessary  to  do 
simple  height  verification.  Almost  all  other  stereo  processes  we  are 
aware  of  produce  sparse  disparity  results  designed  for  puiposcs  other 
than  verification.  Work  by  Panton11  and  Henderson11  provide 
possible  exceptions. 

In  summary,  ’'dike  many  other  stereo  processes,  St  Is  not  overly 
reliant  on  perfectly  registered  stereo  pairs  taken  simultaneously  by  well 
parameterized  cameras,  nor  does  it  require  threshold  tweaking  to 
xcommodatc  matching  of  edges  or  vertices.  It  produces  an  easily 
analyzed  dense  disparity  image.  Si  wzs  chosen  fo"  use  in  stereo 
verification  because  these  properties  coincide  well  with  the  aerial 
image  analysis  domair.  that  spam  addresses. 


Il(xj)-(l,(x+  hk(x,y)f)  +  B(xyv)Drlj(x+  hk(xy)y\) 
where  D  denotes'  derivative  w.r.L  x. 

This  quantity  ‘s  linear  in  S(x.y).  as  illustrated  in  Figure  3-1.  It  could  be 
minimized  directly,  but  wc  get  better  results  by  combining  many  such 
es’imates  from  each  point  in  the  neighborhood  of  (xo>)  osing  a  least 
squares  technique,  and  then  minimizing.  In  any  case,  the  estimate 
based  on  derivatives  is  valid  only  over  a  range  around  jr4-  n  on  the 
order  of  the  size  of  the  averaging  window  that  has  teen  us^d  to  smooth 
the  image.  But  to  be  useful  we  require  mat  this  estimate  be  accurate 
over  a  range  of  at  least  4.  the  discrepancy  between  the  actual  disparity 
and  our  disparity  estimate.  Thus  because  the  initial  disparity  error  is 


4.  Using  Stereo  Verification  with  an  Aerial  Image 
Database 

Certain  steps  arc  necessary  for  a  stereo  process  to  work  automatically 
as  a  verification  procedure  in  association  with  s  database  of  aerial 
imagery.  A  block  diagram  is  given  in  Figure  4-1  that  outlines  the 
procedure  and  shows  the  interxtions  between  the  Image  Analysis 
Process  (spam)  and  the  Image  Database  ;maps).  We  can  loosely 
organize  these  steps,  begmnirg  wiui  the  identification  of  a  region  of 
interest  by  the  image  analysis  process  as: 


o 


Figure  3-1:  Estimating  Disparity 


s* 

s‘ 

C 

a 

r. 

K 

t. 


312 


& 

s 

s 

r\ 


sr 


r/ 


u 


Figure  4-1:  The  Stereo  Verification  Pro- -a* 


1.  Select  Covers  i*-:  Determine  the  available  alternate  image* 
that  cover  the  region  of  io'crcst.  Select  the  most 
appropriate  -ltcmativc(s). 

2.  Extract  the  Stereo  Pair:  Fxtract  a  stereo  pair  of  the  region 
from  the  image  coverage  selected. 

3.  Register  the  Stereo  Pair:  Compensate  for  misalignment 
cnors  inherent  in  the  aerie!  image  database.  Do  any  other 
processing  necessary  to  assure  the  stereo  pair  meets  any 
assumptions  made  by  the  stereo  process. 

4.  Kan  the  Stereo  Process.  Apply  some  stereo  matching 
process  (eg..  St). 

5.  Analyze  the  Results:  Analyze  the  stereo  results  in  order  to 
verify  if  the  region  of  interest  has  height  relative  to  its 
surroundings. 

The  stkrfosys  process  is  initiated  by  SFaM  with  parameter* 
identifying  the  region  of  interest,  the  database  image  that  is  being 
interpreted  and  contains  the  region,  and  an  estimated  height  range  (0-5 
meters,  0-15  meters,  10-20  meters,  etc)  for  the  region. 

Using  the  identity  of  the  region  of  interest,  SI  LkEOSYS  extracts  the 
region's  centroid,  its  boundary  point  list  and  an  associated  minimum 
bounding  rectangle  (MUR)  from  the  maps  datcoase.  This  data  is  used 
in  determining  alternate  imagery  coverage,  m  extracting  the  itereo 
pair,  and  in  analyzing  the  stereo  icsults. 

The  MAPS  database  is  used  to  produce  an  unsotied  list  of  images, 
called  a  cove  rage  file.  Each  image  in  the  coverage  file  contains  the 
region  of  interest.  The  image  being  inrerprcialcd  by  spam,  and  an 
image  from  the  co-  -rage  file  form  the  stereo  pair. 


The  estimated  height  is  used  to  select  a  disparity  range  that  affect* 
the  contrast  of  the  disparity  image  produced  by  the  sx  algorithm.  The 
resulting  disparity  image  is  quantized  to  256  disparity  levels.  If  this 
range  is  set  too  large,  the  disparity  mage  will  lack  contrast  and  will  be 
more  difficult  to  analyte,  if  it  is  set  too  small,  extremely  large  height 
disparities  will  occur  outside  the  image  range  and  will  effectively  be 
invisible.  In  other  words,  (he  initial  disparity  range  determine*  the 
scaling  of  measured  disparity  into  the  disparity  matte  As  in  any  linear 
scaling  operation  one  would  like  to  utilize  the  full  dynamic  range  of 
the  output  image  r.  hiic  avoiding  clipping  at  either  end  of  the  range. 
The  selection  of  the  disparity  ange  constitutes  the  only  external 
parameterization  necessary  in  the  implemented  process.  Our 
experience  has  shown  that  the  disparity  range  need  only  be  within  *  set 
of  rather  broad  vaiucs  to  obtain  useful  results.  For  now,  we  use  only 
three  pre-selected  ranges.  Since  Vam  actually  selects  die  disparity 
range  based  on  its  region  hypothesis,  there  is  potential  to  add  ranges  to 
accommodate  additional  hypothesis  types  or  to  run  the  stereo  process 
over  a  set  of  disparity  ranges. 

The  following  Sections  will  discuss  these  procedural  steps  in  more 
detail  and  describe  how  SilrixtsyS  implements  them.  Some  details 
arc  specific  me  si  matching  algorithm  used  by  S1FRFOSYS  but  are 
mentioned  sc  i:ai  the  reader  may  better  understand  our  results. 


4.1.  Salact  Coverage 

The  MAPS  database  is  used  to  produce  an  unsorlcd  list  of  images, 
called  i  coverage  file.  Each  image  in  the  coverage  file  contains  the 
region  of  interest.  The  ir’erpretation  image  is  used  to  create  the  Let 
stereo  pair  image  since  the  SI  disparity  image  result  o-crlays  the  Lei' 
image  and.  as  will  be  seen,  since  there  is  no  guarantee  that  the  stereo 
image  cxtncic*  from  the  aberrate  image  will  be  properly  registered , 


313 


r 


r. 


f  a 


* 

m 

* 

J 

i 


the  region.  The  coverage  file  is  used  lo  select  the  database  image  from 
which  the  Right  stereo  pair  image  will  be  extracted.  However,  in  most 
eases  the  coverage  file  lists  several  images  that  contain  the  rcgionm 
Question.  Several  considerate:  enter  into  the  c (voice  of  <hc  b«a 
candidate.  First,  to  minimize  resampling  cxtrapolalionthe  candidate 
should  be  of  the  same  or  larger  scale.  Second,  to  reduce  pebble 
perspective  distortion,  the  candidate  should  have  the  region  of  mum* 
as  near  tc  its  center  as  possible,  lo  the  ca*  °f  verucri  aenJ 
photography  this  is  the  region  s  nadir  auiance.  Third,  if  die 

candidate  should  be  from  the  same  photographing  mission^  'ven  fiight 
line,  as  the  original  image  to  reduce  temporal  changes 
cloud  cover  and  ground  movement  Figure  4-2  illustrates  a  pair  of 
S  mapping  aLaft  ftighilines  that  generate  stereo  coverages 
successive*  frames  of  the  same  flighUine  as  well  as  between  adjaceM 
flighdines.  Figure  4-2  also  illustrates  that  small  changes  in  the  aircraft 
platform  posiu^n  and  direction  can  effect  the  «uaJamaofo^Up 
and  must  be  accommodated:  one  cannot  assume  a  constant  direct** 
or  viewing  position.  This  is  discussed  in  Section  4 2. 

Other  issues  such  as  the  source  of  the  image,  its  recency,  the 
processing  and  digitization  history  can  enter  mto 
bnaecs  used  to  produce  the  stereo  pair.  For  ou'  purposes.  STEREOSYS 
sorts  the  coverage  file  into  a  best  stereo  coverage  order  with  respect  to 
the  hypothec's  region’s  originating  image  as  follows: 

•  Same  Mission  images  (sorted  by  nadir  distance) 

»  Same  Seale  images  (sorted  by  nadir  distance) 

•  All  Other  images  (sorted  by  nadir  distance) 

The  first  image  ir.  the  sored  coverage  file  best  satisfies  these  criteria 
and  is  used  to  create  the  Right  image. 


4.2.  Extract  the  Stnrno  Pair 

The  extraction  of  the  stereo  pair  images  is  not  a  simple  summage 
cropping  procedure.  Like  almost  all  stereo  algorithms,  SI  assumes 
image  scanlines  in  the  stereo  pair  arc  stereo  cptpolar  lines.  Without 
rotation  this  will  not  be  the  ease  with  the  selected  l  eft  and  Right 
images.  Photographic  mission  flight  lines  need  not  align  with  image 
digitization  scanlincs  and.  even  if  they  did.  sometimes  the  best 
coverage  is  found  across  mission  flight  lines  or  even  from  separate 
missions.  For  these  reasons,  a  baseline  orientation  between  the  stereo 
pair  is  calculated  so  that  the  pair  can  be  rotated  to  properly  align  the 
scanlincs  to  meet  the  cpipolar  constraint 


Several  issues  arc  considered  in  determining  the  sire  of  the  unage 
area  to  be  extracted.  First,  the  area  most  contain  the  region  s  MBR  plus 
a  portion  of  the  surrounding  area  since  the  si  stereo  rcsulj  >>  _Y 
-pruain  relative  height  mfcrtnation.  In  addition,  the  extracted  arc. 
must  be  large  enough  so  mat  the  region  of  interest  n  container 
rectangular  sub-tmagt  cropped  from  the  routed  image. 

Specifically,  to  r.oduce  the  necessary  stereo  pair.  smtOSYS  extracts 
orEotally  'rectified  areas  identified  as  North-South  oriented 
geographic  :ccung!es  by  sub-pixel  interpolation  .  rhe  c^c'^  ^e 
extraction  wangle  arc  calculated  as  a  futKUon  of  the  regions 
Sd  die  region’s  Milk.  the  I  .eft- Right  image  scales,  and  the 
rotation  necessaoi  u>  make  the  extracted  unage  East-W«  scanltne. 
align  with  the  baseline  between  the  database  coverage  images. 

4.3.  Regl.ter  th*  Stereo  Pelr  .  .  .  ^ 

As  mentioned  k.  station  3.  si  is  capableof 
disparity  v  ofTsm  -  tween  stereo  purv  Usinu  this  Si  ca^btUy,  the 
initially  extracted  stereo  sub-image  ^airs  arc  ^P^^P*****1  * 
Si  to  determine  the  local  horizontal  and  vc,^al  * 

Left  and  Right  images.  With  each,  pass  over  the  wage  P^-  » 
calculates  a  global  of.cst  value  between  the  images,  rhe  P«*«*  » 
repeated  and  die  offset  compounds  unul  the  offset 
tooscillate.  Culculutian  of  the  registration  offset  is  necessary  became 
gc^euc  pnsitiun  correspondence  control  between  images  stored  m  he 
mshs  database  is  not  sufficiently  accurate  to  guarantee  that  the 
extracted  Right  image  will  overlay  the  l-cft  image  within  the  tolerances 
over  which  SI  can  perform  effective  matching  As  mentioned  eariicr. 
in  many  ease-  the  initial  registration  errors  may  range  from  5  to  Jt) 
pixels  while  the  disparity  shift  is  generally  smailc-  than  10  pixels. 

One  can  view  the  stereo  matching  process  as  first  applying  a  cc*1'** 
registration,  followed  by  the  actual  calculation  of  d«P«Uy.  K  B 
interesting  to  note  Out  the  same  icch'.iquc.  method  of  difference*, 
appears  to  be  effective  for  bow  global  registration  and  local  matching. 
A  possible  alternative  lo  this  registration  step  would  be  the  addition  or 
sufficient  ground  control  to  assure  that  images  in  the  maps  daufrase 
could  be  registered  within  acceptable  tolerances  of  2  to  4  pixels 
However,  given  that  the  ground  sample  distance  for  many  of  the 
images  is  approximately  one  meter,  and  that  maps  contains  a  ride 
variety  of  imagery  with  difference  ground  scales,  projecuons,  from 
multiple  sources,  it  ht  unlike'y  that  one  would  be  able  to  totally 

»Kj»  initial  rpffitjralin  CfTOf. 


However  this  necessary  roution  doesn’t  correct  for  distortions  due 
to  non-parallel  camera  axes.  Ever  if  the  stereo  process  is  sophisticated 
enough  to  account  for  large  amounts  of  perspective  distortions, 
chances  arc  it  will  not  be  able  to  acci  unt  for  these  distortions  after  they 
have  been  rotated.  Therefore,  the  s-rco  pair  Left-Right  images  a.e 
extracted  through  an  orthogr.  phic  rectification  process  before  ’hey  are 
routed.  This  method  of  subimage  extraction  removes  perspective 
distortions  by  warping  the  subimage  into  a  rccungular  gcograp.ix  box 
as  well  as  csublishing  a  common  orientation  for  the  image  scanlines. 


The  calculated  registration  offset  is  then  used  to  extract  the  Right 
r.agc  for  a  second  time.  The  orthographic  extraction  process  is  givee 
new  geographic  box  that  has  been  transla.cd  by  the  calculated  offset, 
a  this  wi  y.  he  new  Right  image  will  be  more  nearly  registered  to  the 
zfl  unage  than  if  we  had  simply  translated  the  original  Pight  .mage. 
Jnsnally  we  felt  the  offset  could  be  handled  entirely  within  the  SI 
terco  process  and  that  resampling  the  Right  image  would  be 
inneccssary.  Expenmenution  showed  his  not  to  be  the  case,  but 
_ .w.  nfTczxt  ranahilitv  was  alfCldV  added  10  SI.  It  IS  SClU  III 


flight  Lift* 

> - > 


lo«g« 

ce*«rig* 


314 


i 


v 


i 

i 

** 

y 
%,  _ 

■s 

V  ' 

w" 

i 

L 

r 

£ 

•  - 


use.  Thai  is,  even  though  we  calculate  an  image  offset  and  resample 
the  Right  image  for  a  second  time,  we  soil  later  calculate  any 
remaining  offset  between  the  Left  and  resamnled  Right  images  and 
use  that  value  within  the  sto  »  process  itself. 

If  necessary,  the  resulting  Left-Right  stereo  pair  images  are  rotated. 
The  SI  stereo  process  assumes  that  scanlines  are  stet  to  cpipolar  lines. 
Until  this  point  the  stereo  pair  scanlines  were  East-West.  Earner  a 
rotation  value  was  calculated  for  use  in  deierniring  the  size  of  the 
extraction  area.  The  rotation  value  is  (he  amount  (he  images  must  be 
ro'aicd  to  make  the  cpipolar  lines  become  scaniincs  and  assure  that  the 
Left-Right  pair  create  a  positive  stereo  image  (ie.  tad  objects  shift 
inward).  The  rotation  value  is  the  baseline  o Dentation  that  was 
calculated  earlier  as  the  angle  at  the  geographic  center  of  the  original 
image  between  East  and  the  line  to  the  alternate  image  gro graphic 
center.  After  rotation,  the  appropriate  subimage  rectangle  o*  real  data 
is  cropped  from  the  rotated  image  since  the  rotation  leaves  four  rigid 
triangles  of  non -data  at  the  corners. 


4.4.  Run  the  Stereo  Process 

At  this  point  all  constraints  required  by  the  si  algorithm  on  the 
stereo  pair  have  been  met  The  following  few  comments  concern  (he 
specific  use  of  the  Si  process. 

The  Left-Right  stereo  pair  images  art  repeatedly  smoothed  to  form 
the  coarsc-fiiic  hierarchy  of  images  used  by  St.  As  in  Section  4J,  si 
again  calculates  a  global  rcgist'ation  offset  value  between  the  original 
i  .eft  image  and  the  resampled  Right  image.  This  global  offset  is  used 
internally  by  si  during  its  calculation  of  the  disparity  image.  The 
disparity  image  result  a  sa>cd  for  analysis  upon  completion  of  the  SI 
disparity  process 


4.3.  Analyze  tha  Results 

In  general  the  methods  used  in  analyzing  stereo  results  will  depend 
on  the  stereo  process  used,  the  sensing  mchod.  and  the  type  ot 
disparity  map  produced  by  the  process.  >~ -ncrally,  one  can 
characterize  stereo  matching  results  as  one  of  the  tollowing: 

•  point  correspondence!^ 

•  sparse  depth  map 

•  dense  (complete)  depth  map 

The  objective  of  stereo  verification  is  to  determ-ne  if  the  region  of 
interest  has  height  relative  to  its  surroundings.  One  of  the  major 
reasons  for  choosing  si  as  our  stereo  process  a  that  its  dense  disparity 
image  simplifies  this  analysis  step.  Analysis  of  sparse  feature  based 
depth  results  like  those  produced  by  edge- based  or  interest  area-based 
stereo  processes  would  require  cartful  dctcrmmauon  of  whether  a 
feature  belongs  to  the  region  of  interest  or  to  its  surroundings.  One 
obvious  method  would  be  to  interpolate  the  sparse  depth  results  into  a 
dense  map  similar  ui  die  si  disparity  image.  However  it  is  not  clear 
how  reliable  such  a  map  would  be.  especially  given  (lie  comples 
images  presupposed  in  aerial  .ntei  prelation,  and  techniques  for  doing 
such  interpolation  are  sull  considered  a  topic  for  reseaich'5.  The 
remainder  of  this  Section  describes  hew  STtRCOSYS  analyzes  SI 
disparity  images  and  is  tllustrateu  by  several  ciamples. 

In  order  to  analyze  the  dense  Si  dispanty  image  an  overlaying 
bitmap  of  the  region  of  interest  is  made.  First  the  region's  boundary 
point  list  is  rectified  to  overlay  the  pre-rotated  Left  stereo  pair  image. 
The  rectified  boundary  point  hst  is  then  convened  to  a  bitmap  image 
of  the  region.  Finally  the  bitmap  is  routed  to  properly  overlay  the 
Left  stereo  pair  image  used  in  the  disparity  image  calculation.  The 
bitmap  is  used  to  disunguish  Lie  areas  of  the  dispanty  image  inside 
and  outside  the  region  of  interest. 

'The  dispanty  unage  and  die  overlaying  region  bitmap  are  used  to 
calculate  the  mean  and  'tandard  deviation  ‘or  the  dispanty  values  of 
the  areas  within  and  without  the  region  of  interest  Sri  I  FtfOSYS  uses  a 


icunstic  furction  that  combines  the  standard  deviations,  SM  and 
end  the  difference  in  the  means,  D,  to  determine  four  confidence 
values; 

1.  Ovcra'l  Confidence  in  the  Stereo  Results. 

2.  Confidence  in  the  Region  having  little  to  h'o  Height 

3.  Confidence  in  the  Region  having  Moderate  Height 

4.  Confidence  in  the  Region  having  Significant  Height 

The  first  measure  describes  the  overall  confidence  that  can  be  placed 
on  the  stereo  results.  The  dispanty  image  results  can  vary  from 
excellent  to  useless  due  to  limits  in  correcting  for  misregistration  and 
from  noise  caused  by  nondescript  areas  (Section  5.1).  The  confidence 
in  the  result  ts  calculated  as  an  empirically  weighted  sum  of  the  mean 
difference  and  standard  deviations. 

a.yD+oss+ts.AS_ 

The  D  term  is  further  influenced  by  the  disparity  image  contrast  which 
is  related  to  the  disparity  range.  A  very  small  range  can  decrease  this 
term  by  an  empirical  factor  of  0.2.  Pic  S  term  is  further  influenced 
by  an  esumate  of  the  amount  of  expected  ncight  .luttci  in  'he  are*.  If 
the  area  d  expected  to  be  cluttered  with  tall  objects  this  term  increases 
by  an  cmpincai  factor  of  0.75.  Both  the  disparity  range  and  clutter 
values  are  provided  by  the  processing  context  that  caused  spam  to 
invoke  STEkFXKYS.  These  contexts  include  rules  that  recognize 
situations  where  height  information  can  disambiguate  competing 
hypotheses  as  well  as  supply  likelihcods  of  clutter  and  height 

Confidence  values  (2w)  measure  whether  the  region  of  interest  was 
found  to  foil  in  one  of  three  dispanty  or  height  ranges,  provided  by 
SPAM.  These  incisures  are  relative  to  the  hypothesized  disparity  range, 
rattier  dun  absolute  statements  about  the  regions  height  For 
example.  "Little  to  No  Height"  could  mean  about  5  meters  high  if  a 
very  Urge  disparity  range  was  selected  but  could  mean  k*  than  on* 
meter  if  a  small  range  was  used. 

All  three  height  confidence  vzlun  are  based  on  the  difference  in  the 
means,  D.  but  can  be  influenced  by  the  dispanty  range  in  a  manner 
similar  >o  die  D  term  in  the  results  confidence  desenbed  above.  These 
values  reflect  where  the  height  of  the  region  foils  within  the  height 
range  supplied  by  spam.  Confidence  (:)  a  maximized  as  D  goes  to 
zero.  Confidence  (J)  is  maximized  when  D  b  approximately  1/7  of  the 
foil  dispanty  range.  Confidence  (a)  is  maximized  as  0  goes  to 
maximum  dispanty.  It  should  be  remembered  that  very  high  objects 
can  create  dispanty  values  beyond  the  range  of  maximum  disparity  in 
which  case  their  extreme  height  would  go  unnoticed. 


t.  Experimental  t'.esulte 

This  Section  prevents  results  produced  by  STZREOSYS  that  ilustrate 
several  of  the  important  issues  encountered  during  system 
development  We  also  amplify  comments  made  in  previous  Sections 
concerning  issues  of  rcgisuitx  i.  disparity  estimates  and  automating 
the  overall  stereo  process,  it  is  important  to  keep  in  mind  severe! 
issues  regarding  the  spam  task  environment  and  these  experiments. 
First,  all  of  the  aerial  mapping  photography  m  the  MAPS  image 
database  is  nominally  vertical.  Since  each  image  is  in 
"correspondence"  with  a  ground  control  database,  it  is  possible  to 
compute  the  geographic  coordinate  fur  each  pixel  in  the  image.  Of 
course,  there  arc  inherent  inaccuracies  in  this  process,  both  in 
measurement  of  the  landmark  positions  and  in  recovering  their 
position  in  the  imagery.  Incsc  inaccuracies  lead  to  imag"  olfvtts  when 
the  siib-imigc  areas  arc  cxtracicd  from  the  full  image  frames  using 
geographic  location.  No  assumptions  arc  made  with  rcspcci  to  the 
relative  position  of  the  cameras  other  than  those  described  in  Sections 
4.1  and  4.2. 


K 


315 


fe*. 

•''i 


1 


4 


S* 


a 


I 

i 


to 


J 

l 


f 

I 


Second,  the  actual  height  of  the  region  of  interest  is  not  calculated, 
tl.at  is.  we  do  not  solve  the  full  camera  equations,  since  the  disparity 
image  ts  sealed  to  record  a  particular  range  'f  derations  as  describe  m 
Section  4.  One  could  calculate  the  baseline  diriae:::.  and  knowing  :he 
focal  length  and  aircraft  height  solve  for  the  actual  hc.ght,  bet  ,y»cn 
die  sufstical  natjrc  of  our  l.nai  analysis  we  have  not  found  it  to  be 
necessary.  Finally,  it.,  greund  sample  distance  for  the  -nagery 
reported  on  in  this  paper  is  approximately  one  metet  per  pixel. 

Secuon  5.1  describes  typica'  si  results  before  minor  revisions  were 
made  to  die  matching  algorithm  and  stexeosys  teas  implemented. 
Section  5.2  illustrates  the  problem  caused  by  database  image 
misregistrati  n  and  results  produced  by  stereos  YS.  Section  5.3  deals 
with  stereo  pair  preparation  processes  as  well  as  how  the  Si  results  are 
a  alytcd.  Section  5.4  desenbes  test  results  from  the  automated  use  of 
STkp.coS'i  S.  Finally.  Section  5.5  details  a  specific  example  from  among 
the  automated  tests  of  Sccuou  5.4. 

Figure  5-1  show*  one  frame  of  serial  imagery  containing  National 
Airport  in  Washington.  D.C.  All  of  die  examp  es  in  this  paper  come 
from  various  areas  of  this  airport  extracted  frot .  several  stereo  image 
pairs. 


5.1.  Preliminary  St  ExpwrlnrenU 

Before  trying  to  build  a  stereo  verification  system  using  Si.  wt 
experimented  with  die  overall  process  in  order  to  get  a  feel  for  how  si 
might  perform  with  maps  images.  Several  issues  arose:  how  u> 
automatically  set  si’s  initial  disparity  range  values;  deciding  on 
modifications  to  provide  die  flexibility  necessary  to  accommodate 
spam’s  requirements  for  a  veneration  process;  and  how  to  analyte  si 
results.  These  first  experiments  were  performed  on  stereo  pair  images 
registered  by  hand  and  extracted  from  the  database  using  the  same 
orthographic  rectification  process  as  in  STEREDSYS. 

Figure  5-2  shows  the  Uft  and  Right  stereo  images  of  a  long  hangar 
building  running  diagonally  from  the  lop  right  to  bottom  left  of  each 
mage.  Below  the  Left  image  is  the  SI  disparity  image  result.  Within 
the  disparity  image  dark  areas  are  closer  to  the  camera  (higher)  then 
are  the  light  areas.  The  hangar  is  clearly  shown  to  be  higher  than  its 
surroundings.  Some  points  of  interest  concerning  the  disparity  image 
are: 

•  The  speckled  areas  are  caused  by  the  loss  of 
correspondence  in  Urge  nondescript  areas  such  as  the  Urge 
solidly  shaded  areas  of  pavement  to  the  right  and  below  the 
hangar.  Such  nondescript  areas  arc  characterized  by  the 
lack  of  edges  or  texture. 

•  Boundary  edge  effects  show  up  as  errors  all  around  the 
disparity  image.  l'hcxc  cffecl  arc  caused  by  luck  of  data 
outside  the  image  and  have  been  alleviated  somewhat  in 
the  modified  versions  of  SI. 

•  Stereo  abising  effects  probably  caused  the  problem  with 
the  curved  hangar  roof  in  the  lower  left  corner.  The  white 
area  in  the  roof  indicates  a  concave  section  where  none 
eiiarit 

•  I  cmporal  changes  in  die  stereo  pair  images  can  cause 
unpredictable  results.  An  example  is  the  white  area  along 
the  right  side  caused  by  the  mo'  mg  truck. 

Figure  5-3  shows  a  taxiway/iunway  area  of  the  airport.  This  area 
contains  very  little  variation  in  height  and  contains  large  variations  in 
image  intensity.  Ihc  disparity  image  shows  no  significant  height  for 
any  image  region  but  again  illustrates  the  problems  with  large, 
nondescript  areas  and  edge  effects.  Note  also  that  the  edge  effects  are 
propagated  into  nondescript  areas.  The  statistical  analysis  method 
described  in  Section  4.5  was  chosen  partially  because  of  its  ability  to 
recognize  there  situations  as  not  being  a  significant  i  idication  of 
elevation. 


5.2.  Registration  Problem  and  Solution 

The  results  of  the  previous  Section  were  produced  from  stereo  pain 
that  were  registered  by  hand.  That  L>.  the  identification  of  the 
extraction  areas  -us  net  done  automatical!)  and  any  misregistration  in 
the  stereo  pair  was  kept  to  iess  than  two  pixels.  This  can  be  contrast 
with  the  6  to  15  pixel  Jispanucs  we  normally  expcnenced  in  the 
images  used  of  the  Washington  D  C.  Nauonal  Airport. 

Through  cxperimettaiion  it  was  found  that  the  Si  process  could 
sometimes  produce  fair  results  if  the  stereo  pair  was  up  to  6  pixel* 
misrcgistered,  but  this  was  found  to  be  far  too  restrictive  for  automatic 
purposes  since  the  v  VP5  correspondence  between  database  ir-ges  can 
be  off  by  as  much  as  30  pixels  or  mo  e  in  areas  with  bole  ground 
controL  Figure  5-4  shows  early  S)  results  from  a  stereo  pair  created 
automatically  from  the  database. 

The  misregistration  problem  is  handled  by  Si's  ability  to  calculate  a 
global  disparity  shift  between  pairs  of  images.  SYEREOSYS  uses  SI  to 
calculate  the  shift  between  the  originally  extracted  stereo  pair  then  uses 
the  shift  va'ue  to  re-extract  the  Right  image.  Figure  5-5  illustrates  this 
process  The  top  two  images  are  the  original  Left-Tight  stereo  pair. 
The  lower  right  contains  the  Right  image  after  a  calculated  shift  of  7 
purls  vertical  and  13  pixels  horizontal  has  been  eliminated. 

Since  the  shift  is  an  inexact  statistical  value  S!  was  also  modified  to 
calculate  and  compensate  for  any  remaining  small  misregistrations. 
The  lower  left  of  Figure  5-5  contains  the  disparity  image  that  results 
from  the  combination  of  these  techniques.  Ihis  approach  has 
demonstrated  the  ability  to  properly  compensate  for  original 
misregistrations  of  i  p  to  25  pixels.  Beyond  that  point  the  global  shift 
calculauon  normally  fails  However  this  shortfall  can  be  properly 
ovcicomc  by  adding  enough  control  to  the  image  database  to  assure 
misregistrations  will  not  exceed  the  limits  oi  the  registration  process 


5.3.  Analygl*  of  Results 

If  the  reader  looked  carefully  at  the  stereo  pair  used  in  the  last 
Section  they  might  have  noticed  that  the  pair  forms  a  negative  stereo 
image.  That  is.  objects  with  height  lean  away  from  one  another  and.  if 
viewed  in  stereo,  would  form  a  reversed  stereo  image.  In  such  an 
image  buildings  would  appear  to  go  down  into  the  ground.  Tocrrrect 
this  either  the  I  eft  and  Right  images  could  be  exchanged  or  both 
could  be  routed  ISO  degrees.  We  choose  to  route  the  stereo  pair 
images  since  this  can  be  combined  with  the  arbitrary  rouuons 
necessary  to  align  scanlines  and  epij  •'■ar  lines.  The  r-vi!~  Figure 
5-6  show  the  rotated  stereo  oau  of  Section  j.2.  Note  that  the  disparity 
images  shew,,  m  section  5.2  were  produced  after  exchanging  toe 
images  to  from  a  positive  stereo  pair  or  else  the  disparity  image  would 
have  shown  negative  height  for  the  buildings. 

In  order  to  analyze  the  Si  disparity  images  an  overlaying  bitmap  of 
the  region  of  interest  is  produced  as  described  in  Section  4.5.  The 
hnmap  is  used  to  disunguish  areas  of  the  disparity  image  as  being 
either  inside  or  outside  the  region  of  interest.  Based  on  this  separation, 
the  mean  and  standard  deviation  of  disparity  values  within  and 
without  the  reg,on  are  calculated.  STfTEOSYS  uses  the  standard 
delations  and  the  difference  in  the  means  in  its  heuristics  that 
dcicimme  the  stereo  vcnficaron  confidence  values  also  described  in 
7  Section  4  5.  These  values  relied  confidence  in  the  stereo  result  and 
cc.ifi Jem  e  in  the  region  of  interest  having  little  height  moderate 
hciRlri  or  significant  height,  lh:  values  arc  such  that  0  0  signifies  nr. 
confidence  while  1  0  signifies  "perfect’’  confidence.  An  example  of  the 
bitmap  and  the  confidence  values  are  also  shown  in  Figure  5-6 . 

Figure  5-7  is  an  example  of  jimrosYS  results  where  the  region  of 
interest  has  no  height 


¥  - 

t. 


»  s- 


3 1 G 


5.4.  Fully  Automatic  Use 

One  important  objective  for  STEREOSYS  was  that  it  be  flexible 
enough  to  work  reliably  with  ail  sorts  of  regions  and  in  concert  with 
spam.  To  test  STEREOSYS  against  these  goals,  spam  was  given  access  to 
STEREOSYS  for  the  purpose  of  stereo  verification  while  trying  to 
interpret  the  V/ashington  D.C.  National  Airport  area.  STEREOSYS  was 
called  upon  to  give  a  verification  analysis  of  70  regions.  Table  5-1  lists 
the  confidence  results  for  these  regions  of  interest.  The  Human 
Interpretation  column  gives  the  correct  interpretation  for  each  regjor. 
Tne  SPAM  Hypothesis  column  gives  the  spam  hypothesis  used  n 
invoking  STEREOSYS.  Exact  interpretation  of  the  Low,  Med  arx*  H  gh 
columns  depends  on  what  hypothesis  SPAM  had  for  the  region  when  it 
invoked  STEREOSYS.  For  example,  if  the  h;pothesis  was  for  a  low 
object  like  tarmac  then  law  would  indicate  a  range  -in  heights  of  0-1 
meters;  Med  1-5  meters;  and  High  5-infinity.  But,  if  ihc  hypothesis 
was  for  a  molcratcly  ta'i  object  such  is  a  hangar  then  Low  would 
indicate  a  range  of  0-5  meters:  Med  5-12  meters:  ai.1  High  12-infimfy. 
A  similar  broadening  of  ranges  would  hold  for  very  tall  hypothesis  but 
in  this  case  of  airport  analysis,  such  hypothesis  are  not  used 

Close  examination  of  Table  5-1  reveals  that  as  the  Result  confidence 
decreases  height  confidences  tend  to  move  tow, ad  Low.  This  is 
because  disparity  images  with  low  Result  confidences  are  randomly 
noisy.  This  causes  the  mean  values  for  the  areas  within  and  without  the 
region  of  interest  to  become  nearly  equal.  The  heuristics  cakulat"t 
height  confidences  rely  mostly  on  the  difference  m  these  means;  no 
difference  indicates  no  height  The  very  few  cases  where  poor  results 


ci.iic  confidence  in  the  region  ocing  tall  happen  when  the  region  of 
■ntr  rest  is  very  small  and  happens  to  lie  on  a  random  dark  area  of  the 
disaaniy  imr4e. 

Table  5-2  summarize*  the  test  by  categorizing  result  confidence 
values.  This  data  primarily  reflects  how  often  the  system  was  able  to 
proper, y  register  the  stereo  pair.  Result  confidences  of  over  3.6  (out  of 
l.Oi  reflect  good  regisirauon.  R-rult  confidences  below  0.4  reflect 
casus  where  the  system  was  probably  unable  to  determine  the  shifts 
necessary  lo  bnng  the  stereo  pair  into  registra'ion.  Va’ues  0.4  -  0.6  can 
be  caused  by  areas  duuered  with  high  objects,  highly  nondescript 
areas  or  registration  problems  Poor  resuits  due  to  bad  registration  can 
be  alleviated  through  the  addition  of  correspondence  control  between 
the  data  base  images  to  achieve  a  be  tier  initial  camera  model. 
However,  this  is  unlikely  to  be  a  viable  solution  in  practice  due  io  the 
expense  of  adding  ground  control  points  The  remaining  problems 
like  noodcvnpt  areas  and  moving  objects  are  inherent  m  the  stereo 
process  itself  and  are  not  dealt  with  in  tfiis  work.  Table  5-2  also 
summarizes  bow  well  the  confidence  results  agreed  with  human  height 
evaluation  for  tne  regions  being  verified.  Tor  the  purposes  of  this 
evaluation  a  "winner  take  all"  strategy  is  used.  That  t,  the  height 
confidence  range  having  the  highest  confidence  was  deemed  to  be  the 
height  assigned  to  the  region  by  STEREOSYS. 

The  careful  reader  win  notice  that  the  Human  Agreement"  value 
in  Table  5-2  does  not  decrease  when  the  "Confidence"  value  is  below 
0.4.  The  dispanty  image  remits  fer  regions  with  such  low  confidence 


.-  *• 

V. 


31"» 


'r=*«. 


are  usually  randomly  noisy.  Statistically,  this  causes  the  deference  in 
mean  disparity  -values  between  the  areas  inside  and  outside  the  region 
to  approach  zero  which,  in  turn  causes  the  STEREOSYS  height 
coi’.ilien  hett, -tilts  to  >a.or  a  to*  'leigrti  .nt-zrpir'ation.  CT the  7C  test 
cases  p reamed,  only  eight  are  of  regions  with  significant  height  and  of 
these,  only  one  created  a  result  confidence  below  04,  Since  the 
remaining  poor  resul.  confidences  were  caused  by  regions  without 
height,  and  STEREOSYS  favors  a  low  height  interpretation  in  case* 
where  it  can'i  calculate  an  answer,  this  somewhat  inflates  the  percent 
of  human  agreement  within  the  low  confidence  range. 

o 

Tables  5-3  through  5-6  are  confusion  matrices  showing  the 
performance  of  STEREOSYS  over  several  result  confV’.oee  ranges.  The 
table  columns  ere  the  number  of  times  a  spam  hypothesis  was  correct 
o>  incorrect,  with  respect  to  height,  as  compared  to  a  human 
interpretation.  For  example.  if  spam  hypothesised  a  low  height  region, 
say  grassy  area,  and  the  region  was  any  other  low  height  region,  say 
tarmac,  then  the  hypothesis  was  deemed  correct.  Ths  table  rows 
indicate  the  number  of  times  STEREOSYS  confirmed  or  rejected  the 
spam  hypothesis.  A  perfect  result  would  find  STEREOSYS  alway* 
confirming  correct  hypotheses  and  rejecting  the  tncorrecl  hypotheses, 
ie.,  zeros  m  the  lower  left  and  upper  right  elements  of  the  confusion 
mauii. 

Table  5-7  indicates  that  sttreosys  performs  well  with  objects 
having  height.  One  initial  concern  with  the  St  stereo  process  a  that 
often,  when  it  a  initiated  with  too  smill  a  divpanty  range,  the  SI 
method  will  not  converge  to  i  useful  result  This  could  be  the  case 
when  spam  hypothesizes  an  object  with  no  height  and  in  reality  the 
object  has  significant  height  To  lessen  die  chance  of  this  problem 


occurring  we  tried  to  1 1  generous  in  the  size  of  our  three  standard 
‘'eight  ranges.  In  the  one  case  where  this  situauon  actually  occurred, 
STEREOSYS  produced  the  correct  response. 

6.5.  A  Detailed  Exarv.pi* 

As  an  example  of  how  stereo  verification  can  aid  image  analysis,  this 
Section  describes  one  of  the  70  invocations  of  ST7 EOS'S  by  spam 
from  Section  5.4.  This  Sec’ion  is  included  also  to  give  the  reader  a 
flavor  for  how  spam,  a  Tile-based  production  system,  utilizes  stereo 
verification.  As  menj  ted  in  the  introduction,  spas,  -lay  invoke 
STEROSYS  in  one  of  two  modes.  During  early  stages  of  scene 
interpretation  spam  gathers  low-level  information  by  testing  newly 
generated  regions  for  he  ght  in  order  to  develop  in  initial  set  of 
hypotheses  for  the  region.  During  later  processing,  as  collections  of 
regions  begin  to  be  combined  into  components  of  the  airport  model. 
STEREOSYS  is  employed  to  disambiguate  between  two  or  more 
plausible  but  conflicting  hypotheses.  This  Section  describes  the 
former  situation  by  showing  extracts  from  the  spam  and  STEREOSYS 
execution  traces.  These  extracts  have  been  edited  slightly  to  enhance 
their  readability. 

Figure  5-8  contains  several  of  the  spam  OPS3‘4  rules  that  lead  to  the 
invocation  of  STEREOSYS  The  first  rule,  regxm-to-ffagment;:get- 
depth,  is  used  to  recognize  an  appropriate  point  fir  the  invocaoci  of 
STEREOSYS.  The  firing  of  this  rule  causes  spam  to  change  its  operating 
context  to  the  stnerate-depih  irjo  fas*.  The  next  two  rules  are 
examples  of  rules  activated  by  this  context  They  will  set  uo  the 
STEREOSYS  parameters  appropriate  fot  the  region  of  interest's  current 
best  hypothesis  based  on  an  assigned  umfidence  value.  Tbe-je 


1 


313 


REG 

Hunan 

SPAM 

Conf Idancts 

10 

Inttrprnt 

Hy  potf*  i  S  1  S 

Rasul t  Low 

Had 

High 

R01 

Runway 

Runway 

0.86 

0.98 

0.02 

0.00 

R02 

Runway 

Runway 

0.42 

0.98 

0.02 

0.00 

R03 

Run**/ 

Runway 

0.33 

0  98 

0.02 

0.00 

R04 

Park ing-apr ;*i 

Hangar 

0.34 

'0.98 

0.02 

0.00 

R05 

Gratsy- araa 

Part ing-apro 

0.28 

0.42 

0.48 

0.10 

R06 

Runway 

Run* ay 

0.44 

0.21 

0.60 

0.19 

R07 

Runway 

R  jnway 

0.40 

H.82 

0.16 

0.02 

R08 

Tamay 

Park ing-apro 

0.36 

9.30 

0.57 

0.13 

RH9 

Tai iway 

Hangar 

0.58 

0.13 

0.57 

0.30 

RIO 

Ta* iway 

Hanga  ‘ 

9.29 

0.98 

0.07 

0.00 

R12 

T  ailway 

Parklng-apro 

0.69 

0.70 

0.26 

0.08 

R13 

Tiitaiy 

hangar 

0.58 

0. 51 

0.41 

0.06 

R14 

Taiiwny 

Hangar 

0.23 

0.75 

0.22 

0.03 

R1S 

T  a ■  iway 

Hangar 

0.39 

0.98 

0.02 

0.00 

R16 

Taiia.y 

Hangar 

0.37 

G.  98 

0.02 

0.C.3 

R17 

Taaaaay 

Hangar 

0.78 

J.  22 

0.60 

0.16 

R18 

Taiiway 

Parking-lot 

C .  32 

0.98 

0.32 

0.00 

RIO 

Taii-ay 

To.  slnal 

0.27 

0.98 

0.02 

0.00 

R20 

Grany-araa 

Hangar 

0.32 

0.87 

0.21 

0.0. 

R22 

Grassy-araa 

Park Ing-apro 

0.46 

0.43 

0.48 

0.09 

R23 

Grtssy-ar «a 

Hangar 

0.21 

0.19 

0.60 

0.21 

R2S 

Gratsy-araa 

Hangar 

0.31 

0.98 

0.02 

0.00 

R26 

Gratty-trta 

Park Ing-apro 

0.40 

0.98 

0.02 

0.00 

R27 

Grassy-arta 

Hangar 

0.94 

0.55 

0. 3* 

0.07 

R28 

Grassy  arta 

Park Ing-rpro 

0.40 

0.50 

0.42 

0.09 

R29 

Grassy-arta 

Hangar 

0.30 

0.27 

0.86 

0.19 

R30 

Grassy  «raa 

Hangar 

0.67 

0.98 

0.02 

0.00 

R32 

Gras  ‘'-araa 

Hangar 

0.73 

0  "S 

0.02 

ft. 00 

R35 

Grassy-arta 

Park ing-apro 

0.47 

0.98 

0.02 

O.Of 

R36 

Grassy*ar*a 

Hangar 

0.57 

0.98 

0.02 

0.00 

R  37 

Grassy  arta 

Hangar 

0.66 

0.67 

0.36 

0.07 

R38 

Grassy-arta 

Part ing-apro 

0.26 

0.39 

0.61 

0.10 

R40 

Grassy  araa 

Hangar 

0.62 

0.89 

0.10 

0.01 

R41 

Park  ing-l ot 

Parklng-apro 

0.20 

0.98 

0.02 

0.00 

R43 

Parking-lot 

Runway 

0.48 

0.98 

0.07 

1.30 

R44 

Parking-lot 

Hangar 

0.88 

0.98 

Q.d 

0.00 

R4S 

Parking-lot 

Parklng-apro 

C.48 

0.36 

o.tl 

3.11 

R48 

Parking-lot 

Park  fng-ap.‘C 

0.75 

0.18 

u.  &9 

0.29 

R47 

Parsing-lot 

Park Ing-apro 

0.29 

0.98 

0.62 

0.00 

R48 

Parking-lot 

Park ing-apro 

0.26 

0.24 

0.19 

0.17 

R49 

Parking-lot 

Hangtr 

0.81 

0.60 

W.  42 

0.09 

R50 

Grassy-arta 

Grassy-arta 

6.2; 

0.98 

0.02 

0.06 

R51 

Hangar 

Hangar 

0.69 

0.  j6 

8.19 

0.79 

R5» 

Torainal 

Hangar 

0.42 

0.05 

.1.19 

0.79 

R63 

Hangar 

Hangar 

0.99 

b .  09 

0.  17 

0.44 

Rb4 

Grassy  araa 

Park ing-apro 

0.76 

0.60 

0.34 

0.06 

R  55 

Tail way 

Hangar 

0.22 

0  67 

0.37 

0.06 

R56 

TSblMy 

T  trminal 

) .  30 

0.41 

0.49 

0.10 

R  5  7 

Hang*.- 

Park ing-apro 

0.68 

0.06 

0.19 

0.76 

R58 

Hangar 

Hangar 

0.26 

0.74 

0.23 

C.  33 

R62 

Park irg -  3  ot 

Hango.- 

0  85 

0  96 

0.02 

0.03 

R64 

C:  asiy-araa 

Park irg-apro 

0.40 

0.12 

0.66 

3.32 

R65 

Gratsy  araa 

Park  tng-apro 

0  39 

0.51 

0.41 

0.06 

fif>6 

Grassy-arta 

Grassy-trta 

0.36 

0  27 

0.59 

0.15 

R67 

Grassy  araa 

Park ing-apro 

0.49 

0  98 

0  02 

0.00 

R68 

Gf  assy  araa 

P*r k  mg-arta 

0  39 

0  65 

0  29 

0.06 

R  73 

Ttrmlnal 

Hangar 

0.67 

0.10 

0  52 

0  38 

R  75 

Tl'imil 

Hangar 

0.65 

0.C9 

0.50 

0.41 

R77 

Terminal 

Hanger 

0.60 

to? 

0  39 

0.54 

R  78 

Tarmac 

Park ing-apr > 

0.69 

0.99 

0.02 

0.00 

R  79 

Park  ing-apro 

Park  Ing-  t,»ro 

9.43 

0.9? 

0.05 

o.oo 

R80 

Park  wig-|p,-o 

Park 1n^- ap. o 

0.36 

0.54 

0  38 

0.08 

R81 

t  arnac 

Park ing-apro 

u . 

0.98 

0.12 

o.oo 

R82 

Park  1*g- apro 

Gr  si  sy-arah 

0.50 

0.98 

0.02 

0.00 

R83 

Tarmac 

Park  «ng-«pro 

0.55 

0.20 

0.59 

0.21 

*84 

Park  tng-apro 

Hangar 

0.39 

0  96 

0.02 

0.00 

*85 

Tarmac 

Pari ing-apro 

0 .17 

0.91 

0.06 

0.01 

R87 

Road 

Tariaay 

0.29 

0.07 

0  36 

0.57 

R94 

Road 

Road 

0.48 

0.14 

0#  7 

0.29 

R95 

Road 

Runway 

0.27 

0.96 

0  02 

o.oo 

Table  5-1:  Tert  Results 


Confidence 

%  of  Tests 

%  Human  Agreement 

O.M.OO 

27.1 

895 

05-.599 

8.6 

66.7 

0.4-.499 

20.0 

64.3 

0.0-X9 

44J 

67  2 

Table  5-1'  Test  Summary 

Hypothesis 

Correct 

Incorrea 

Confirmed 

31 

5 

Rejected 

14 

20 

Table  5-3:  AU  Result  Confidences  [0.0  -  L0] 

Hypothesis 

Correa 

Incorrect 

Confirmed 

*_• 

Rejected 

7 

12 

Table  5-4:  Result  Confidences  [0.4  -  L0] 

Hypothesis 

Correct 

Incorrect 

Confirmed 

10 

1 

Rejected 

2 

12 

Table  5-5:  Result  Confidences  [0.5  - 1.0] 

Hypothesis 

Correct 

Incorrect 

Confirmed 

9 

0 

Rejected 

1 

9 

Table  5-6:  Result  Confidences  [0.6  •  t.0] 

Hypothesis 

Correa 

Incorrect 

Confirmed 

6 

0 

Rejected 

0 

1 

T able  5-7:  Regions  with  Actual  Height  {0.4  •  1j0] 


parameters  are  an  indication  of  the  eipccted  height  raige  and  height 
cluuer  lx  the  region  of  interest  The  rules  Tor  setting'  ~Trmwn. 
parameters  appropriate  for  a  runway  or  a  hangar  hypothesis  are  shown. 
The  rule  applicable  to  the  hypothesis  with  the  highest  confidence  value 
will  fire.  Along  with  setting  the  necessary  parameters,  the  rule  firing 
will  change  the  conics',  to  gei-dep;h  in  order  to  fire  the  nest  rule, 
specific:: gct-rtgion-d-pth.  »*vch  actually  invokes  'he  STEREOSYS 
process. 

Figure  5-9  is  an  csccrpi  from  the  spam  trace  jus.  before  it  invoked 
STrsi  osvs.  The  region  of  interest  is  Hand36809-N.37-0  ("R37”  for 
short).  Rule  firings  853  to  856  step  through  the  development  of 
hypothciir  confidences  for  region  R37.  Ily  the  -n-!  jf  this  sequence  of 
rules  the  region  had  a  0.94  confidci.cc  of  being  a  hangar  and  a  0.68 
confidence  of  being  a  grassy  area.  These  interpretations  weic  based  On 
weak  heuristics  and  measurements  such  as  2D  shape  and  testure. 
spam  also  uses  knowledge  concerning  the  spatial  consistency  of  this 
h.polhcsis  with  othT  region  hypotheses  in  ihc  airport  scene  to 
evaluate  confidence,  but  this  knowledge  is  applied  after  an  initial 
assignment  of  plausible  hypotheses  ba^cd  upon  these  simple 
measures^.  In  order  to  avoid  a  combinatorial  cipln.ion  of  hypotheses 
at  the  spatial  consistency  phase,  it  is  important  that  sn.RrosYS  be  able 
to  icfuic  the  incorrect  hangar  hypothesis. 

Rule  firing  857  changed  the  opeiation  contest  to  the  yetdepth  task 
because  the  hangar  hypothesis  had  the  highest  confidence  of  any 
interpretation  for  this  region.  The  gct-dcpih  task  rule  is  used  to  set  up 
the  invocation  of  the  STfKI  OSYS  privess  by  SPAM.  Based  on  the  type 
of  hypothesis  and  other  knowledge  about  airport  organization,  SPa.M 
selects  parameters  for  height-range  and  clutter  to  be  used  during  rule 
firing  858.  Finally,  using  these  parameters  set  for  finding  height 
information  about  typica .  hangars,  STi  RLOSYS  was  invoked. 


O.'.’.n 

.  ■  /*. 


319 


(p  raplon-to-f ragxwnt : : gat-daptk 

{  <contaxt>  (contaxt  "task  ragion- to-fragMank 
*datu«  **okan> )  ) 

(region  'tokan  <tok#n>  “ragion-itatut  irtarpratad) 

-  (contaxt  "task  <>  raglo«-io-frag«att) 

-  ( stora-raiultl  “raiult-ona  clail-»atcl») 

-  ( jtor#*resul t*  “raiult-ona  *ubcli*»"**4tck) 

— >  (re»aova  <coutaxt>) 

(mat*  contaxt  'talk  Qana.'ata-dapth-lnf  o 
vdatua  <tokan>) ) 

(p  dapth:  :gat-runwajr-dapth 

{  (contaxt  '?aak  gan#r«ta-daptk-lrfo 
*dstu«  «tokan>)  <contaxt>  > 

(frag«aat  'raglon-tokan  <tok#a>-  'bypothasls  runaay 
'confldanct  <o) 

-  (fragrant  'rtglon-tokan  «twkan> 

'kypotkaala  <>  unknoaa  'confldanca  >  <c>) 

-  (contaxt  'task  <>  gcnarata-daptH-Infc) 

(rnaova  <contaxt») 

(bind  <haigkt-a*tl«ata>  0-6) 

(bind  <clottar1ng>  Uolatad) 

(•aka  contaxt  'talk  gat  daptfc 
*datu«  <tokan>  <halgnt-aat1»ata* 

<clutttrlng>  runway)) 


(p  daoth: : sat-kangar-  daptfe 

(  (contaxt  “task  ganarata-daptk-^nfc 
'datu*  <tokaa>)  <conva?t»  > 

(frag«int  'ragion-tokan  <tokan>-  M.ypotkaals  Hangar 
'ccnfldanca  <c>) 

-  (fragment  -raglon-tottn  <t ok an>  'hy-othcsli  <>  unknot 

-confldanca  >  *c>) 

-  (contaxt  'taak  <>  ganarata-daptfc-lnfo) 

(r»«ovt  <contaxt>) 

(b<nd  <ha1gnt-ait1«at%>  10-1nf) 

(bind  <cluttarlng*  clutttrad) 

(■aka  contaxt  'talk  Qtt-dtpt!' 

*  da  tun  <tokan>  <M1ght-aat1*nta>  <duttar1ng>  baa 

(p  toaclflc: :gat*raQloardaptfc 
{  (contaxt  'taak  gat-daptk 
'datu*  <ragtok>  <ka1ght-att»  <f#gcontaxt> 

<kyp>)  taat*  ) 

(global-status  'curr#nt-l«aga  <1«g») 

( intarp-c**aitaati  'ou -put-flit  <outf11t>) 

{  <ragl*a>  (raglon 

-tokaa  <ragtok>  *S)~*ol 1c-na«*  <ay«na«a>)  ) 

-*>  (call  daptk  <sy»-anu>x  <1»g>  APCAL  <fca1gbt-aat> 
<ragcontait>  -o  «outf11a>) 

(ra«ova  <coataxt>) 

( aodlfy  <ragion>  -daptH-le»  <1o»dtpth> 

'daptb-aodarata  <*oddaptb>  'dtpth-blgb  <algbdapt 


t* 


320 


v  /•/ 


2  J 


jy.  V.  -*.V  -V  .V.1VK4  V 


F'gure  5-10  gives  extracts  from  the  stekeosys  trace  of  region  R37. 
An  explanation  of  the  trace,  coded  by  the  bc'd  capital  letters,  follows: 


•  A:  Tnc  input  parameters  art  listed.  The  parameter 
H  tight- rangt  determines  what  disparity  range  Si  will  use. 

In  this  ease  it  was  set  to  fO-in/because  spam  thought  R37 
might  be  a  hangar  which  is  usually  10  or  more  meters  high. 

•  B:  The  database  is  used  to  And  the  boundary  lis*  Ale  and 
centroid  for  R37. 

•  C:  Again  the  database  is  used  to  create  an  unsorted  “jec" 
coverage  file  of  images  containing  R37. 

•  D:  The  coverage  file  is  sorted;  the  best  im-ges  are 
selected:  and  Lie  extraction  regions  are  raimlanvii  Notice 
that  s>  net  the  rotation  value  is  so  near  zero  the  later 
rotation  steps  ire  skipped  over  and  replaced  by  simple 
UNIX  moves  (mv). 

•  E:  The  Ortho  process  extracts  an  urthographically  rectified 
stereo  pair. 

»  F:  SI  is  invoked  to  calculate  any  misregistratioo  between 
the  pair.  The  calculated  offset  is  not  shewn  *n  the  trace, 
but  for  this  particular  region  it  was  13  pixels  vertically  and 
14  pixels  horizontally. 

•  G:  Ortho  is  called  to  extract  a  new  Right  image  based  oo 
the  calculated  offset 

•  H:  SI  b  again  invoked  to  calculate  any  remaining  offset 
and  produce  a  disparity  image. 

•  I:  The  boundary  list  for  Rif  is  warped,  or  rectified,  to 
overlay  the  left  image  extraction  area.  The  result  is  a 
"seg"  fib  which  is  convened  to  a  bitmap. 

•  J:  The  disparity  image  is  snalyzcd;  the  statistics  are 
converted  into  confidence  values  and  the  results  are  sent 
back  to  spam. 

These  particular  results  are  interesting  in  that  SPAM  sent  R37  to 
stereos ys  with  a  curren.  hypothesis  that  R37  was  a  hangar  but  got 
back  a  fairly  confident  indication  that  there  was  no  appreciable  height 
present  That  is.  the  result  confidence  was  0.6ft  with  a  0.57  confidence 
that  R37  bad  little  to  no  height  F'gure  5-11  shows  R37's  originally 
extracted  stereo  pair,  the  disparity  result  and  region  bitmip. 

The  effb.-  of  temporal  change  and  the  relative  insensitivity  of 
STI  R  tows  to  such  chai.gcs  is  evident  in  this  experiment  Notice  that 
the  plane  on  the  taxiway  in  the  Right  image  has  moved  a  significant 
distance  toward  the  top  of  the  area  of  interest  in  the  Left  image.  This 
causes  two  artifacts  in  the  dnpanty  image  calculated  by  STtPEOSYS. 
The  a.ca  in  uie  Left  image  where  jne  would  have  expected  »  find  a 
plane  corresponding  to  the  one  ir  the  Right  image  shows  a  very  Urge 
disparity  (height)  eynm-ic.  Similarly,  the  missing  pUr.c  in  (he  Right 
image  shows  a  very  small  (nearly  zero)  disparity  csumaie.  Since  we  art 
looking  at  global  staustics  ov'.t  the  enure  the  region  bitmap  and  the 
surrounding  area  of  interest,  these  anomalies  represent  a  small  portioa 
of  the  statistical  sample.  It  is  probably  the  case  that  the  result  and 
height  confidence  measures  would  have  been  a  little  better  had  lhi» 
situation  not  occurred.  However,  small  temporal  changes  can  be 
expected  to  occur,  and  u  o  important  for  the  vcnficauon  system  to  be 
rclauvely  uiscnsiuve  to  them. 


853  reg  icn-io-f  rag*en; generate- lubclass-aatefi  2939 
2912  2938  CCHF ID-LIST  (ICO  909238  0.0  .053321 
M*ich  of  hangar  region  nand368C9*RL  37_0  -  .9408 

854.  interpret -as -hanger  2946  M  2912  C935  2938 

Interpret  i  ng  region  Hino36803 -R  .  3  7..0 

855.  regicn-to-fregwent :  g*n*r#t«-*ufclass-«a:ch  2923 

2912  2S76  COUFID-LIST.  (1.0  918C12  124367) 

Mett*  of  greftpy-araa  rag»on  Hand3fc8C9-8 . 37_0  •  .6807 
858.  intarpret-as-grasay-traa  2978  12  2912  2974  2978 
Interpreting  region  Hano36809-B. 3 7_0 

857.  region-to-frag»ent*  get-aeptn  ?912  2984 

858 .  deoth: : get-hangar  -  dept*  2986  2984 

859.  *p#Clf  ic:  :gtWegion-depth  2988  29*2  2  2984 


Figure  5-f:  spam  Execution 


A:  STfACOSYS:  »*(1oa_ia  •  HaneJSAM-U .  3?_0 

&«v«ric  ■  acreage 
ftegton.type  -  AJICAL 
Heignt_range  •  10- Inf 
Clutter  •  cluttered 

STIRIOSYS:  Keg  Ion  reap  file  key  is  R09-37 
•:  03_I»f0: 

03_f 1 1#  •  /v 1i'/ air port/Herd 388 09-M/37AU . d) 

X  -  139841.816519  Y  -  277348.270188 
C:  STIRIOSYS: 

d3entcor  /v 1*f /a trport/Nand36809-«/37A0  d3  R09-37.ec 
D:  STIRIOSYS:  sorting  £C  file  by  stereo  coverage 
STIRIOSYS:  selecting  best  toveraoe: 

Left  •  dc368P9  See  let,  r  0.00008) 

R IgA**  •  PC36808  Sea  1  eR  -  0.00008) 

Helfrfeight  -  82  Halftfidih  •  44 
Rotation  •  9.782102 

Oeijat  •  0.040688  Del. ion  •  0.083419 
I:  SUHOSYS: 

ortho  dc3680?  38  50  34  902  77  2  2)  472 
33  30  49  330  77  2  3)  03 
-•  1.056782  R09*37 , 1 . tap 

Mapping  lba  laege  of  Pc36609  to  the  boa  former  by: 

Ut  >38  30  34  (902)  Ion  v77  2  23  (472) 

and  tat  N33  5D  48  (130)  Ion  V77  2  3)  (II) 

Requeitnd  grids  He:  1.06  metnra 

-ctuai  gridsifp:  .0407*. 0425  aec.  ( 1.03x1.06  aeteri) 
Sua  of  result:  60390  bytea  (330  ro«i  X  113 
STIRIOSYS; 

ortho  4caC606  31  80  34  902  77  2  23  472 

30  30  46  330  77  2  3)  85 

-■  1.038762  R09-37.r.pre 

*;  STIRIOSYS :  Off  Set  cad  file  creeled  •  523825 . of f . tap 
STIRtOSYS:  si  b2582S.ef f . tap 
6:  STIRIOSYS: 

ortho  dc. 16801  31  50  34  374  77  2  22  741 

31  50  47  802  77  2  32  354 

-a  1.058762  R09-;Vr.tap 

STIRIOSYS:  av  RQ9-37.r  tap  ROfl-37. fight 
STlktOSTS:  mv  R09-3?.J.taa  R0a~17.1eit 
M:  $T|n|0$ YS :  SI  toaaend  file  crested  ■  c? ! 123 . cad. tap 
STIRIQSYS:  si  C2562S  cud.tap 
I:  STIRIOSYS:  Creeled  varped  SCO  file  R09-37.«.seg 
STIRIOSYS: 

sectolag  409-J7  n.aec  -0  R09-37.tt.tap  -I  R09-37.1#ft 
STIRIOSYS:  av  R09-37.tt.tap  R09-37  ttitaap 
J:  STIRIOSYS:  Stereo  stetlstics  for  Ha«d36409-M. J7_0 
»ean  difference:  3.41722 
Region  ttddew:  28.9986 
Beckgnd  atddev:  34.3314 
Result  confidence:  6.661236 
Low  depth  confidence:  0.87)924 
Moderate  depth  confidence:  0.330413 
High  depth  confidence:  0.065663 


Figure  5*10:  stereosys  Execution 


324 


6.  Conclusions 

We  believe  that  using  height  infcimation  in  verification  of  aerial 
image  analysis  is  an  important  approach  and  that  the  general  stereo 
verification  steps  of  Section  4  arc  minimal  and  applicable  to  all  image 
analysis  supported  by  an  image  database.  In  this  context,  our  work 
with  stbieosys  has  explored  the  pertinent  issue*  and  found  viable 
solutions  to  the  following  important  questions: 

1.  How  can  an  aerial  image  database  automatically  generate  a 
useful  stereo  pair  containing  an  arbitrary  region? 

2.  How  can  a  stereo  system  handle  the  misregistration 
problems  Lihc.cnt  in  variable  sourced  image  database*? 

3.  What  kind  of  stereo  results  are  appropriate  for  ust  in  a 
verification  process? 

4.  How  can  stereo  results  be  analyzed  so  as  to  reflect  not  only 
the  presence  (or  absence)  of  height  but  also  the  inherent 
reliability  of  the  results? 

si  i  rlosys  is  not  an  infallible  stereo  verification  system  as  indicated 
by  the  experimental  results  presented  in  Section  5.  However, 
ST1RIOSY5  is  a  highly  flexible  system  that  accomplishes  the  tniirt 
stereo  process  auiimwcally  from  selecting  image  coverage  to 
analyzing  the  stereo  results  while  using  an  image  database  that  has  le* 
than  perfect  image  correspondence  capabiliuev  From  this  viewpoint, 
we  feel  STFJtKOSYS  has  demonstrated  the  potential  u*e  of  stereo 
verification  in  xnal  image  analyst*. 


If  one  defines  stereo  verification,  as  we  do,  to  be  a  process  whose 
purpose  is  to  give  a  simple  indication  of  the  depth  of  one  region  in  an 
image  relative  to  the  rest  of  the  image,  then  stereo  verification  can  be 
seen  to  be  applicable  to  any  domaiu  where  the  identification  cf  region* 
with  significant  differences  in  depth  is  important  F'or  example,  stereo 
verification  could  be  useful  for  collision  avoidance  in  mobile  robotic* 
or  for  the  initial  locating  of  tall  objects  in  aerial  photographs,  This  is 
especially  true  if  an  emphasis  is  placed  on  the  use  of  fast  and  flexible 
processes,  stimosys  has  shown  itself  to  be  flexible  but  lacking  in 
speed  primarily  due  to  the  necessity  for  suoimage  rectification  during 
the  extraction  of  the  stereo  pair  image*.  The  registration  step,  needed 
u  determine  the  offset  in  the  originally  extracted  pair,  is  also  time 
consuming.  Approximate  tune  for  each  experiment  is  about  20  epu 
minutes  using  t  vax  n/rm  under  the  UNIX  operating  system. 
However.  re  believe  stereo  verification  can  be  done  far  mote 
efficiently,  particularly  by  using  specialized  hardware  whese 
architecture  is  tailored  to  the  matching  and  rectification  algorithms.  If 
so.  this  method  can  be  a  powerful  component  of  a  knowledge-based 
image  analysis  system  andean  greatly  improve  its  ability  to  generate  an 
accurate  gene  description. 

Many  different  approaches  to  performing  passive  photographic 
stereo  have  been  studied1 '  and  several  have  been  implemented  but  few 
have  been  incorporated  into  systems  that  accomplish  anything  usefiil 
beyond  producing  aesthetic  results  if  given  a  tightly  controlled  stereo 
pair.  Flexible  stereo  verification  is  s  useful  application  of  stereo 
processes.  Our  work  has  outlined  the  general  process  of  stereo 
verification  and  has  studied  how  one  stereo  process,  SU  can  do  usefiil 


•  crificalion  work.  We  believe  that  one  immediate  direction  of  study  in 
ittreo  verification  should  be  in  the  testing  of  other  known  stereo 
processes  in  stereo  verification  systems. 


7.  Acknowledgements 

Wilson  Harvey  integrated  STTREOSYd  into  the  SPAM  rule-based  aerial 
photo  interpretation  system  and  enabled  us  to  run  experiments  using 
realistic  system  hypotheses  and  machine  generated  segmentations. 
Steven  Shafer  and  John  McDermott  commented  extensively  on  an 
earlier  draft  of  this  paper.  Gudrun  Klinkcr  and  Victor  Milcnkovic 
provided  additional  helpful  comments.  - 

Clifford  MeVay  was  a  Visiting  Researcher  to  the  CMU-CSD  during 
the  academic  year  1984-1985  from  the  Defense  Mapping  Agency 
Aerospace  Center  in  St.  lajuis  (DM  AAC).  Bruce  Lucas  is  currently  a 
System  Designer  in  the  Information  Technology  Center  (1TQ  at 
Camcglc-Mellon  working  on  computer  graphics  and  user  interfaces. 
David  McKcown  is  currently  a  Senior  Project  Scientist  in  the 
Computer  Scrnce  Departmen'  working  in  Digital  Mapping  and  Aerial 
Photo  Interpretation. 

This  research  was  partially  sponsored  by  the  Defense  Advanced 
Research  Projects  Agency  (DOD).  ARPA  Order  No.  3597.  monitored 
by  the  Air  Force  Avionics  Laboratory  Under  Contract  F33615-81- 
K-1539  and  by  the  Defense  Mapping  Agency  Under  Contract  DMA 
800-85-C-0009.  The  views  and  conclusions  contained  in  this  document 
are  those  of  the  authors  and  should  not  be  interpreted  as  representing 
the  official  policies,  either  expressed  or  implied,  of  the  Defense 
Advanced  Research  Projects  Agency,  of  the  Defense  Mapping  Agency, 
or  the  U.S.  Government. 


8.  References 

1.  McKcown.  D.M..  and  McDermott.  L  "Toward  Expert  System* 
for  Photo  Interpretation",  IEEE  Trends  and  Applications  HI, 
May  1983.  pp.  33-39. 

2.  McKcown,  D.M,  Harvey.  W.A.  and  McDermott,  J,  "Rule 
Based  Interpretation  of  Aerial  Imagery".  '.I  TT,  Transaction.' or. 
Pattern  Analysis  and  Machine  Intelligence.  Voi.  PAMI-7.  No.  5, 
September  1985.  pp.  570-585. 

3.  McKcown.  DM.  and  Pane,  J.  F_  "Alignment  and  Connection 
of  Fragmented  Linear  Features  in  Aerial  Imagery",  Proceeding 
II. 7.7.'  Computer  vision  and  Pattern  Recognition  Conference. 
June  1985,  Also  available  as  Technical  Report  CM  U-CS-85- 122 

4  McKcown,  D.M„  Dcnlinger.  J.I.,  "Map-Guided  Feature 

Extraction  from  Aerial  Im-gcry",  Proceedings  of  Second  IC.EE 
Computer  Society  Workshop  on  Computer  Vision: 
Representation  and  Control,  May  1984.  Alio  available  a* 
Technical  Report  CMU-CS-84-U7 

5.  McKcown.  DM,  "MAPS:  The  Organization  of  a  Spatial 
Database  System  Using  Imagery,  Terrain,  and  Map  Data”, 
Proceedings:  DARPA  Image  Understanding  Workshop,  June 
1983,  pp.  105-127,  Also  available  as  Technical  Report  CMU- 
CS-83-136 

6.  McKcown.  DM...  "Digital  Cartography  and  Photo 
Interpretation  from  a  Database  Viewpoint”,  in  New 
Applications  of  Databases  Gargarin.  G.  and  Gclcmbe,  EL,  e<L, 
Academic  Press.  New  York.  N.  Y„  1984,  pp.  19-42. 

7.  Lucas,  3.  D.,  Generatiied  Image  Matching  by  ihe  Method  of 
Differences  PhD  dissertation.  Computer  Science  Department, 
Carnegie  Mellon  University.  July  1984. 


8.  Mor-vcc.  H.  P„  Obstae’e  Avoidance  and  Navigation  in  the  Real 
World  by  a  Seeing  Robot  Ro-er,  PhD  dissertatior,  Computer 
Science  Department,  Stanford  University,  Sep'ember  1980, 

9.  Baker,  H.  H.,  "Depth  from  Edge  and  Intensity  Based  Stereo”, 
AIM  347,  Stanford  Artif.  Intell.  Lab,  l''8L 

10.  Ohta,  Y  and  Kanade,  T.,  “Stereo  by  Intra-  and  lr.ter-tcanline 
Search  Using  Dynamic  Programming”.  IEEE  PaMI,  Vol 
PAM  1-7,  No.  2,  March  1985,  pp.  139-154. 

11.  Herman,  M-  Kanade,  T,  and  Kuroe.  S_  "Incremental 
Acquisition  of  a  Three-Dimensional  Scene  Model  from 
Images”.  IEEE-PAMI,  VoL  PAM1-6,  No.  3,  May  1984,  pp. 
331-340. 

11  Pan  ton,  D.  J,  "A  Flexible  Approach  to  Digital  Stereo 

Mapping",  Phatogrwnm.  Eng.  Remote  Sensing  44,  December 
1978.  pp.  1499-1511 

13.  Henderson.  R.  l_.  Miller,  W.  J„  Grosch,  C.  P„  “Automatic 
Stereo  Reconstruction  of  Man-made  Targets".  Compu.er 
Vision.  Graphics  and  linage  Processing.  Vol.  1861979,  pp.  . 

14.  Vocar,  J  M.  and  Faiss.  R.  0.,  “Image  Magnification  O.i 
STARAN",  Tech,  report  GER-16342,  Goodyear  Aerospace 
Corporation,  August  1976. 

15.  Terzopo'ilos,  D.„  “Multi-Resolution  Computation  of  Visible- 
Surface  Representations".  Tech  report.  Dept,  of  EE  4  CS.. 
MIT.  1983.  Ph.D  Thesi* 

16.  Forgy.  C  L,  "The  OPS5  User's  Manual”.  Tech,  report, 
Camegie-Mcllon  University,  Department  of  Computer  Science, 
1981. 

17.  Barnard.  S.  T.  and  Ftschler.  M.  A.,  “Computational  Stereo” 
Computing  Surveys.  Vol.  14,  Nc.  4,  December  1981  pp. 
553-571 


* 


9 ' i()l ^ 


THE  TERRAIN-CALC  SYSTEM 


Lynn  H.  Quam,  Artificial  Intelligence  Center 


SRI  International 

333  Ravenswood  Avenue,  Menlo  Park,  California  94025 


OVERVIEW 


Terrain-Calc  is  a  system  for  synthesizing  realistic 
sequences  of  perspective  views  of  real-world  teirain  that  is 
described  by  a  database  consisting  of  geometric  and 
photometric  models.  The  geometry  of  the  surface  is 
described  by  a  digital  terrain  model,  which  is  a  2- 
diniensional  array  of  elevations  defined  on  a  regular  grid. 
The  photometry  of  the  terrain  is  described  by  a  source 
image  covering  all  or  part  of  the  area  contained  in  the 
terrain  model.  This  image  is  geon.e'rically  related  to  the 
terrain  model  by  a  projection  (usually  a  perspective 
projection |  that  relates  world  coordinates  to  image 
coordinates. 

The  image-synthesis  process  is  approximately 
equivalent  to  the  following  physical  analogue: 

1.  Create  a  physical  model  of  the  terrain  using  a 
construction  material  that  haa  a  Lcmbertian 
reflectance  function. 

2.  Project  the  source  image  onto  the  terrain  nodcl 
using  a  projector  with  proper  focal  length,  placed  at 
the  proper  position  and  orientation  (equivalent  to 
the  perspective  projection  model  relating  the  source 
image  to  the  terrain) 

3.  View  the  physical  terrain  model  with  a  camera 
having  the  desired  focal  length,  position,  and 
orientation. 

Views  const  meted  according  to  this  description  are 
approximately  what  would  have  been  seen  by  a  camera  as 
defined  by  |3)  over  the  actual  terrain  at  the  same  time 
(hat  the  source  image  was  acquired.  The  differences  are 
due  to  the  following  effects: 

•  Tiie  geometric  and  photometric  models  are  limited 
in  resolution  and  r  curacy. 

•  Portions  of  the  surface  that  should  be  visible  in  the 
synthesized  view  were  not  visible  in  the  source 


The  view-synthe*i3  algorithm  is  related  to  a 
technique  developed  by  the  computer  graphics  community 
called  “texture  mapping”  (Quam  1071),  (Btinn  1978), 
(Catmull  1080).  A  novel  algorithm  is  used  in  Terrain-Calc 
to  avoid  aliasing  that  results  from  violating  the  sampling 
theorem. 


THE  MODELS 

The  geometry  of  the  surface  is  described  by  a  digital 
terrain  model  that  is  a  2-dimensiot.al  array  of  elevations 
z(x,v),  where  x  and  y  are  defined  on  a  regular  grid.  Each 
square  of  the  grid  is  cut  into  two  planar  triangular  facets, 
(hording  the  diagonal  that  maximizes  the  angle  between 
the  normals  to  ’he  two  triangular  facets. 

The  photometry  of  the  terrain  is  defined  by  a 
digitized  source  image  covering  al!  or  part  of  the 
geometric  model.  It  is  assumed  that  the  surface  materials 
obey  Lambert’s  Law,  which  makes  it  possible  to  generate 
relatively  realistic  views  without  detailed  modeling  of  the 
surface  materials. 

The  relationship  between  digitized  pixel  values  in 
the  source  image  and  real-world  luminous  flux  at  the 
surface  is  generally  unknown  because  of  the  many 
parameters  in  the  film  processing  chain  before  the  image 
is  digitized  and  because  of  the  effe-ts  of  atmospheric 
scattering,  l  or  images  acquired  with  calibrated  sensors,  it 
would  be  possible  to  synthesize  views  where  the  light 
source  is  it  a  position  different  from  that  in  the  source 
image,  so  long  as  the  terrain  obeys  Lambert’s  law. 


THE  VIEW-SYNTHESIS  ALGC  UTHM 

Views  are  synthesize. I  bv  iterating  over  the 
triangular  facets  in  the  ter-ain  mcdcl,  projecting  the 
vertices  of  each  triangle  to  the  source  and  view  images, 
and  "w-.rping”  each  triangular  patch  in  the  source  image 
inlo  its  corresponding  patch  in  tile  view  image. 


•  The  actual  surface  materials  do  not  obey  Lambert’s 
Law. 


327 


r* 


The  warp  step  iterates  over  pixels  on  the  regular 
grid  of  the  synthesized  view  that  are  within  each  triangle, 
comp, Hie.  ;  the  position  of  the  ronreponding  pixels  in  the 
source  image  using  the  linear  (ransfcrmation  that  maps 
the  triangle  in  the  synthesized  view  into  the  source  image. 

Tiie  "warp'  opeiation  starts  by  determining  the 
sampling  relationship  between  pixels  in  the  synthesized 
view  and  pixels  in  the  source  image.  For  each  triangle  in 
'he  synthesized  view,  a  circle  of  one  pixel  diameter  is 
constructed  at  any  point  in  the  triangle.  Since  all  of  the 
triangles  are  planar,  and  the  following  projections  do  not 
Include  perspective  scale  change,  the  particular  choice  of 
point  does  not  matter.  A  cone  is  constructed  by 
projecting  this  circle  'hrough  the  projection  center  of  the 
synthetic  camera.  This  cone  is  intersected  with  the 
corresponding  triangular  facet  of  the  terrain  model, 
forming  an  ellipse.  A  second  cone  is  constructed  by 
projecting  this  ellipse  through  the  projection  center  of  the 
camera  for  the  source  view.  The  intersection  of  this 
second  cone  with  the  image  plane  of  the  source  view 
results  in  an  ellipse  that  corresponds  to  the  circular  pixel 
in  the  synthetic  view  (see  Figure  1). 


Figure  Is  Geometry  of  the  View  Synthesis  Moritbm 


A  somewhat  more  accurate,  but  also  more 
complicated  calculation  of  l  lie  sampling  relationship 
projects  a  one-|)i\e|-si|tiare  area  from  the  synthesized 
view,  rather  than  a  circle,  and  results  in  a  quadrilateral 
area  in  the  source  image. 

The  use  of  this  sampling  relationship  is  essential  to 
avoid  problems  due  to  aliasing,  which  result  from 
violating  the  sampling  theorem.  To  avoid  aliasing,  each 
pixel  in  the  synthetic  view  is  computed  by  integrating 
pixel  values  in  tile  source  image  over  an  elliptical  area 
corresponding  to  the  pixel. 


Terrain-Calc  computes  an  approximation  tc  the 
integral  over  an  elliptical  area  by  summing  estimates  of 
integrals  over  circles  that  have  diameters  approximately 
equal  to  the  minor  axis  of  tne  ellipse  along  a  path 
corresponding  to  the  major  axis  of  the  ellipse.  The 
circular  integrals  of  various  diameters  are  formed  by 
convolving  the  source  image  with  circularly  symmetric 
Gaussian  convolution  kernels  of  varying  sizes  using  the 
hierarehieal  Hurt  algorithm  (Burt,  15)S1|. 

Another  form  of  aliasing  can  occur  at  pixels  that 
cross  occlusion  edges,  where  pixels  in  the  synthetic  view 
project  to  several  facets  in  the  terrain  model.  The  most 
severe  problem  of  this  kind  occurs  at  occlusion 
boundaries,  where  the  pixel  in  the  synthesized  view 
pp-jects  to  widely  separated  facets  in  the  terrain  model. 
The  correct  calculation  requires  uimming  the  intensity 
integrals  over  two  or  more  partial  ellipses  corresponding 
to  the  in! '-rsect ions  of  the  rone  with  each  fleet. 

Hidden-surface  elimination  is  accomplished  using  a 
variation  of  the  "II  array”  technique  of  Wright  (Wright, 
107-1).  which  requires  that  (I)  the  facets  be  processed  in  a 
near-to-far  order  ;i  relation  to  'he  synthetic  camera, 
(2',  that  the  world  z-axis  always  project  to  a  vertical  line 
in  the  synthetic  view  (i.e.  no  roll  to  the  camera),  and 
(3)  the  geometric  model  be  a  single-valued  function  z(x,y). 
An  improved  algorithm  (Anderson,  1082).  which  permits 
camera  roll  and  fixes  some  other  p.oblems  with  the  H- 
arruv  technique,  will  be  implemented  in  the  near  futjre. 
A  more  conventional  z-buffer  algorithm  would  eliminate 
all  of  the  above  restrictions  a!  additional  computational 
cost. 

The  time  required  by  the  view-synthesis  algorithm  is 
mainly  dncrnhr.od  by  the  number  of  pixels  in  the 
generated  view  and  the  number  of  facets  that  must  be 
examined  For  views  containing  approximately  320  x  2">0 
pixel;  resulting  fmm  1 1000  facets  the  view-synthesis 
algorithm  requires  about  150  seconds  on  a  Symbolic 
3000. 

INTERACTIVE  USER  INTERFACE 

Terrain-Calc  also  provides  a  sophisticated  graphical 
interface  for  specifying  flight  paths  and  parameters  of  a 
simulated  camera  (see  Figure  2). 

To  specify  a  flight  path,  the  user  first  inv Acs  an 
interactive  curve  editor  to  draw  a  curve  on  top  of  a 
vertical  view  of  the  terra, in  model  as  depicted  on  ihe 
display  screen,  thereby  specifying  the  x  and  y  components 
of  the  flight  path.  Terrain-Calc  then  displays  i  graph  of 
the  terrain  model  profile  underneath  the  flight  path, 
allowing  the  user  to  speej  y  the  z  ci,  nponrnt  of  the  flight 
path  in  relation  to  the  terrain  profile.  A  parametric 


r, 

I 

K 

L-  • 

' 

I.-'- 

f.-'- 

fe 


Figure  2:  (;<l  Terrain-Calc  Showing  a  Synthesized 


litin^c  (upper  window),  the  Snitrre  DTM  with 
I. 'cautions  J)isp|,iye."/  I>y  Brightness 
(lower  left  window).  and  the  lllcvalion  Profile 
l.l  the  View  Direel  Ion  (lower  right  window) 


Figure  2:  (b|  An  Image  Synthesized  by  Terr*m-C*le 


Draw  Mmlr:  Selection  of  wire  frame  or  synthetic 
image  \  ,rw  v 


~l'!i»e  •  urxe  is  fit  lo  the  x  y.  ..lid  7  components  of  the 
flight  puli,  from  which  position  and  direction  run  he 
o.i-ilj  (oinp-ited  .is  a  function  of  distance  along  the  curve 

I  i  toll  synthetic  \  lew  is  computed  by  means  of  a 
persj  eetjw  jrojei  tion  whose  parameters  are  determined 
by  'lie  fluht  path  and  a  parameter  menu  consisting  of 
f  'll.  iu  mg. 

j^Tnhl  iff  \  inr:  I  lor  iron  I  al  field  of  view  This  is  the 
.n rile  relating  the  focal  length  to  the  view  in.age 

' i  nit  It 


\  -•  ipi enee  of  x  lev  s  spared  at  equal  di*tanres  along 
th>-  flight  path  is  generated.  for  each  view,  the 
eomhin.il  ion  of  flight-path  direction,  lilt,  tad  pan 
dci-miiiics  tin1  direction  of  the  principal  camera  rav, 
whnli.  together  with  thi  flight-path  position  »nd  fiscal 
length  determines  ail  of  the  parameters  of  (hr  perspective 
projection  for  the  view.  Sequences  of  views  that  fit  in 
.available  phx  ual  rieitiorx  ean  tie  dynamically  diplayed 
on  the  e,.|or  screen  at  a  rate  of  about  1-1  million  pixels 
pi  r  sc<  oitd,  or  sixteen  .'p.*0  x  2’iO-pixel  frames  oer  -.ccond 
( )n  a  si  ml..  ,t  ir  s  .’ItittO  with  six  megabytes  of  physical 
t.i*  ncri  i here  is  room  for  about  00  frames,  each 
containing  .'.'JO  \  :>-,0  pixel* 


•  Till  Till  or  pitch  of  the  camera  in  a  vertical  plane 
with  r*  -  pee!  to  the  .liritmn  of  the  flight  path 
t'tirt'ntly.  tilt  iiut't  not  In-  large  enough  to  cause 
on  r  i;  '  from  the  camera  to  lie  exactly  vertical, 
l-c,  mis,,  of  limitations  of  the  hidden  surface 
algorithm 

•  I'.i >| •  Pan  or  yaw  of  the  camera  with  reaped  to  the 
flight  palll. 

•  s ,  po  occ  l.nnjih  Niimher  of  eipially  spac<  d  views 
lo  I)  •  generated. 


'stereo  \  lews  ;i r**  created  using  two  identical 
synthetic  cameras  separated  hy  a  user-specified  distance 
on  a  Icri/oiii.il  |me  perpendicular  lo  the  direction  of  the 
principal  ray  They  arc  displayed  either  a.s  left-right  pairs 
of  images  for  slewing  ti'ing  a  ..lereo  x  tewing  box  to  merge 
the  linage*  or  a*  a  cyan/rcd  .in.ighph  image  l.eft- right 
sl'  reo-p.iir  ■  i 1 1 1  at.  is.  can  he  displayed  at  half  the  above 
frame  rale,  .xherea*  anaglpyh  'icrisi  sequences  can  be 
di-plax'd  It  ill"  full  frame  rate 


*.‘y 


r.  . 

►'y 


S  • 

U 


L 


329 


h- 

-*  ■ 


A 


» 


a 


t 

i 

i 

» 

* 

f 


UNSOLVED  PROBLEMS  AND  FUTURE 
Directions 

A  major  unsolved  problem  is  bow  to  improve  the 
efficiency  of  the  projection  algorithm  by  using 
hierarchical  terrain  models,  in  which  the  level  of  the 
hierarchy  (and  therefore  the  size  of  the  facets)  is  chosen  in 
each  neighborhood  of  the  terrain  model  so  that  there  are 
no  noticeable  flaws  in  the  generated  views.  The  use  of  a 
hierarchy  is  prrticularly  important  for  the  synthesis  of 
oblique  vi<-ws.  where  distant  facets  of  the  terrain  modei 
project  to  a  small  fraction  of  a  pixel  in  the  view. 

A  simple  hierarchical  technique  is  to  represent  the 
terr  .1  at  a  hierarchy  of  resolutions,  obtained  by 
convolving  the  terrain  model  z(x,v)  wi»h  Gaussian  kernels 
of  various  sizes  and  then  decimating  the  results.  The 
vicw-s.  nfhesis  algorithm  begins  using  the  highest 
resolution  in  the  pyramid  and.  for  each  row  of  facets, 
keeps  track  of  the  distance  to  the  nearest  facet  in  the  row. 
As  this  distance  increases,  the  coarser  level*  of  resolution 
in  the  terrain  hierarchy  are  used,  in  o-der  to  keep  the 
number  of  view-image  pixels  per  face*  approximately 
constant. 


This  technique  produers  acceptable  results  when 
images  are  viewed  in  isolation,  but  introduce*  annoying 
artifacts  in  motion  sequences  The  problem  ta  that  the 
I raii'ii ions  between  levels  of  the  terrain  hierarchy  occur 
at  different  (daces  in  each  image,  depending  on  the 
distances  l-et wien  the  facets  and  the  rameta.  Such 
transitions  occur  in  jumps  \V»  have  implemented  an 
improvement  that  performs  linear  interpolation  between 
level'  m  the  fi'-^dution  hierarchy  to  eliminate  the  abrupt 
traii-el  ions 


Currently,  Terrain-Calc  only  handles  the  very 
restricted  class  of  geometric  models  of  the  form  z(x,y).  A 
future  extension  will  allow  ,a  mixture  of  ,3-D  modeling 
technique*  To  he  Used  together.  u»ing  a  Z-buffer  (Catmull 
197  1)  or  \-huffer  |C.arpenler)  r<>  merge  the  result*  of  the 
disparate  niodelr  g  systems. 


REFERENCES 


Carpenter,  Loren,  "The  A-Buffer,  an  Antialiased 
HidHe-.  Surface  Method,”  ACM  Computer  Graphics, 
July  1984.  pp.  103-108. 

Catmull.  Edwin.  "A  Subdivision  .Algorithm  for 
Computer  Display  of  Curved  Surfaces,”  University 
of  Utah.  Salt  Lake  City,  Decctrib->r  1974. 

Catmull,  Edwin  and  AJ-  y  Ray  Smith.  "T-D 

Transformations  of  Images  in  Scanline  Order,”  ACM 
SIGGRAPH'80  Conference  Proceedings,  July  1980, 
pp.  279-285. 

Quam,  Lynn,  "Computer  Comparison  of  Pictures,” 
Stanford  Artifical  Intelligence  Project  Memo  No. 
144.  May  1971. 

Wright,  Thomas,  “A  Two-Space  Solution  to  the  Hidden 
Line  Problem  for  Plotting  Function*  of  Two 
Y»rialdes,"  IEEE  Transactions  on  Computers, 
January  197.7,  pp.  28-.J3. 


-s. 


r,\ 
<•  '• 


tr 

I 


Anderson,  David.  "Hidden  Line  Elimination  in 
Projected  tirnl  Surface-.”  ACM  Transaction*  on 
<  oinpiiier  Graphics.  October  I9K2,  pp.  275-288 

-  I’linn.  linns.  "Simulation  of  Wrinkled  Surface*," 

MGGHAl’II  proceedings.  August  1978,  pp.  286-292. 

j 

*’  Kurt.  Peter,  "last  Filter  Algorithms  for  Image 

Processing,"  Computer  Graphirs  *nd  Image 
I  roeessing,  May  1981,  pp  20-51. 


339 


V  V  V  V  V  v 


Converting  Feature  valnee  to  Evidence 
George  Reynolds,  Deborah  Strahiran,  Nancy  Lehrer 


Computer  and  Information  Science  Department 


o  * 


University  of  Maeeachueetta  at  Amherst 


Abstract 

la  thin  paper  w»  describe  the  maUiraaatieai  foundation*  of  • 
knowledge  ttpWMUlica  framrwock  within  the  <<«»«  of  risien 
using  the  theory  of  endential  reasoning  at  developed  by  Demp- 
■ter  sad  SWtr.  W>  dneribr  aa  evidential  font  of  reasoning  aad 
combination  rule  which  m  shown  to  be  eqt:  valent  to  Dempster's 
rule  but  linear  with  respect  to  the  number  of  element*  of  the 
frame  of  discernment.  Thia  ia  •  tremendous  computational  «d- 
vantage  over  tiw  -metal  theory  which  provider  a  d erniw  theory 
exponential  witi  respect  to  the  number  of  elnaraii  m  the  frame. 
A  preliminary  experiment  in  image  interpretation  ia  presented 
to  illuetrate  the  me  of  the  theory. 


1.  Introduction 


In  thie  paper  we  deerribe  the  mathematical  foundation*  of  a 
knowledge  representation  framework  withia  the  domain  of  vi¬ 
sion  uamg  the  theory  of  evidential  reaming  at  developed  by 
Dempettc  (1961)  aad  Shafer  11976).  Thia  paper  ia  only  a  num¬ 
mary  No  proof*  are  green  and  the  example*  are  only  outlined. 
Complete  detail*  will  be  free*  at  a  forth  coming  technical  rep  at. 

The  repreaentation  haa  two  component*.  The  first  part  ia 
atatie,  aad  explicitly  aaaociate*  meaaurahle  pcopertie*  (eg.  fern 
turn)  of  toe  image  data,  via  knowledge  sources,  to  label*  which 
or*  to  be  assigned  to  abstraction*  of  the  image  data.  T"  -  aecood 
part  u»e*  thit  itatie  representation.  a  frame  of  discernment,  and 
the  theory  of  rndencr  ao  drveloped  by  Shafer  to  combine  the 
results  from  the  knowledge  toorce*  and  arrive  at  a  conaratue 
opinion  for  the  purioee  of  determining  ( hr  correct  label  of  the 
image  ihtrutwo. 

vhe  aaaoeiation  between  meomnemenU  and  potential  label* 
of  an  image  abstraction  ia  made  uamg  the  notion  of  a  am 
furetion  at  defined  by  Shafer  |1976|.  ha  thia  paper  mam  func¬ 
tion*  generated  lung  explicit  knowledge  about  the  image  do¬ 
main  in  quest  too  using  possibility  function*  (tee  aection  6).  Meth¬ 
od-  f-e  making  such  an  amociatioa  have  been  made  befoir  (Low- 
ranee  (19(2),  Wesley  and  Hanson  |1985],  Strat  (1964|).  However 
these  method*  require  that  the  range  of  value*  over  which  the 
maaa  functions  am  defined  be  either  explicitly  or  implicitly  dia- 
eretitrd  into  “feature  propositions'  or  subinterral*  of  the  feature 
variable. 

In  our  approach  no  such  discretisation  ia  required  and  the 
process  of  rivaling  a  mam  function  is  defined  a*  a  continuous 


variable  of  a  feature  value.  Moreover  we  will  define  a  combtnn- 
turn  protem  and  a  choice  of  decision  rules  which  am  Hnaar  with 
respect  to  the  number  of  dements  of  the  frame  of  discernment. 
Thia  is  a  tremendous  computational  advantage  over  the  general 

theory  -Inch  provide*  a  ion  theory  exponential  with  respect 
to  tlw  number  of  eieas  a  the  frame. 

Oar  approach  is  only  nor  example  of  how  mam  functions 
might  be  generated  from  mraonria  an  features,  ft  ia  doubtful 
that  them  is  a  domain  independent  way  of  creating  mam  func¬ 
tion*  from  feature  measurement*,  since  the  domain  imposes  con¬ 
straints  on  the  arman'ics  of  the  desired  inference  process,  and 
some  way*  of  generating  mam  function*  may  not  satisfy  these 
constraints. 

The  Com  lira  we  develop  in  this  paper  is  ad  a  strictly 
probabilistic  approach  but  urns  a  product  rale  to  combine  infor¬ 
mation  fnm  Vnowlrdft  sources  and  A  this  scam  can  be  riewed 
as  a  “generalised  Bayesian*  method.  A  recent  paper  of  Hummel 
[19Kj  clarifies  the  reiaUomhip  of  Dempster's  rule  to  mom  tra¬ 
ditional  *  Bayesian*  method*  m  terms  of  sett  t?  ‘experts*.  The 
correct  me  of  the  Dempster- Shafer  formalism  -.voice*  carefully 
stating  the  assumptions  sbo  JL  the  domain  to  which  it  applies 
aad  representing  these  assumptions  within  *b*  knowledge  set- 
wot!:  via  possibility  functions.  H  this  representation  ia  faithful  to 
the  assumptions,  then  Dempster’s  rule  '/ill  lor*  mam  on  conns 
tent  statements  and  thus  “preserve  consistency*.  An  important 
aspect  cf  the  representational  formalism  involves  the  use  of  a 
“conflict  value*  which  detect*  when  an  aseumption  has  been  vi¬ 
olated  and  i-  need  as  representation  of  uncertainty  within  the 
system.  The  system  also  provides  asimpf*  mechanism  to  isolate 
group*  of  proposition*  which  are  mutually  consistent. 

The  combination  peaces*  can  also  be  viewed  as  generating  a 
new  higher  level  'combined*  feature,  each  a*  might  be  obtained 
combining  intensity,  texture,  rdor  and  edge  features.  This  ia  a 
eery  earful  attribute  of  the  system  since  it  separate*  the  combi- 
natioa  process  from  some  other  process  or  processes  which  would 
be  responatiil*  for  executing  a  decision  procedure  in  order  to  de¬ 
termine  which  lahri  is  correct  on  the  basa  jf  this  new  combined 
feature. 

The  effect  of  applying  Dempster'*  rule  to  mam  functions 
which  involve  different  partially  restricted  subset*  is  one  of  fo¬ 
cusing  that  ma*r  or  belief  onto  the  consistent  subsets  of  poastbil- 
ities.  Used  this  way,  the  ideas  of  Shafer  ao  i  Dempster  become 
a  protest  of  pcuaiblilstlc  reasoning  rather  than  probabilis¬ 
tic  reasoning.  The  mam  attributed  to  impossible  situations  ia 
viewed  is  conflict  or  mass  assigned  to  an  “unknown*  object;  it 
may  be  mocutotvd  to  evaluate  the  correctness  of  the  knowledge 
sources,  and  the  knowledge  bam  (frame  of  discernment). 


fc-Aj  n 


¥'• 


331 


a  *  a 


• 

* 


E?S? 


(C 

■5  « 

S  »  q 

«  S  ^ 

•"•»  ST  ® 
M  g.  *0- 

y  s*  “ 

OQ  O 

S  P 
(0 

3  £ 

p  CD 


2  w 

p 

rt- 


V  ■ 


* 


functions  needs  to  be  modified  tu  ■  orrectly  reflect  the  assump- 
tions  of  the  domain,  or  that  an  event  cue  occurred  which  was 
not  included  in  the  frame  and  6  needs  to  be  enlarged. 

If  a  set  d  C  O  is  assisted  mass  t,  then  any  sea  &  witi 
A  C  B  C  0  should  believe  an  amount  at  lease  t  that  it 
contains  the  right  answer.  In  addition  any  set  C  C  9  with 
C  D  A  =  4  should  have  the  extent  to  which  vhe  evidence  refutes 

C  as  containing  the  right  answer  reduced  b»  at  least  t.  This 
Lads  to  the  following  definition: 

Definition  Given  a  mass  function  U:  2®  -*  [0, 1],  the  support 
and  plausibility  of  each  Xd®  is  defined  as  follows: 

SPT(A)  -  £  M(X) 

X 

PLS(A)- 1-  £  M[X). 
sous 

3.  The  structure  of  on 
evidential  knowledge 
base 


Let  us  review  some  of  the  limitations  of  infereocing  using  Bayesian 
probability  models,  especially  in  the  domain  of  vuioc.  Central  to 
the  problem  of  image  understanding  is  how  to  coovert  a  feature 
value  into  evidence  for  an  object  or  set  of  objects  given  knowb 
edge  about  the  relationship  of  a  feature  to  the  set  of  objects. 
For  example  Bayes  rule  is  one  way  of  tasking  such  a  conversion: 
Suppose  we  ere  given  a  set  of  labels  a.h.c, ...  and  probabilities 
p(a ),p(4),f(e).  and  for  feature  /,  p (/  |  «),p(/  I  k),pi/  I  *).— 

and  p(  /).  Then 


p(a  I  /) 


PU  j  <s)pU) 

Pif) 


This  formula  presents  a  number  of  problems.  Consider  the 
problem  of  assigning  labels  to  the  regions  of  some  image  seg¬ 
mentation.  If  for  example  the  labels  are  portofskg,  per  to/- road, 
pnrtofjraH,...  then  p(portofskg)  would  be  the  probability  that 
a  region  in  the  segmentation  is  correctly  labeled  poriof- iky.  Thus 
not  only  is  it  necessary  to  have  Knowledge  shout  the  frequency 
of  the  event  sky.  it  is  also  necessary  to  have  knowledge  about 
the  whims  of  the  specific  segmentation  algorithm  being  used 
and  its  effect  on  the  prior  probability  p(pnrtof-ikg).  Experience 
with  segmentation  algorithms  suggests  that  this  information  is 
extremely  difficult  to  obtain. 

On  the  other  hand,  it  is  possible  to  obtain  some  knew  I- 
edge  about  p (/  |  o)  although  the  statistics  used  to  estimate 
this  number  are  subject  once  again  to  the  variations  imposed 
by  the  segmentation  algorithm  being  used,  variables  which  are 

very  difficult  to  characterise  statistically.  Thus  although  there 
is  some  useful  information  in  these  statistics  there  is  a  tremen¬ 
dous  amount  of  uncertainty  in  the  histograms.  In  addition  it 
is  possible  to  acquire  estimates  of  p(f)  and  the  Rule  System  of 
Hanson,  Riseman,  et  al.  [10851  uses  the  ratio 


P</[«) 

Si 

and  approximations  of  this  ratio  as  a  *xotc*  for  a  when  the 
feature  /  is  observed.  However  the  rule  system  is  limited  in 
its  ability  to  handle  uncertainty,  partial  ignorance  and  conflict 
between  knowledge  sources. 

Consider  the  problem  of  labeling  an  image  with  labels  •  fc  A. 
Tt-e  distribution  pif  [  a)  provides  some  inform st ion  about  bow 
often  a  feature  /  occurs  with  respect  to  the  object  a.  If  the 
frequency  is  high,  for  some  value  /,  then  at  least  we  don’t  want 
to  rule  out  the  poaubility  that  the  correct  label  to  be  assigned  is 
•  On  tbs  other  hand  the  feature  value  f  may  occur  frequently 
for  many  objects  and  so  the  only  knowledge  we  may  have  is  of 
the  form:  Given  ts  ci'servsCicm  /,  (ken  we  deni  vwtl  to  ru it  out 
the  potnhthtp  that  the  correct  !nitl  it  in  At  set  A  oj  loktb  f<-r 
•kick  that  /(stars  occurs  frtg ventfp.  Combining  the  information 
from  many  such  knowledge  sources  then  lends  to  the  correct 
interpretf  ‘  ton. 

Definition  Given  a  frame  os'  discernment  6  and  a  feature  space 
FS,  a  possibility  funct  oa 

h*(/  I  •)  :  FS  —  (0, 1] 

is  a  function  defined  on  a  feature  space  FS  for  each  a  €  8  which 
has  the  interpret  st  ion;  p»(/  |  al  is  the  extent  to  which  we  don't 
want  to  ru'e  out  a  if  we  make  the  observation  /  6  FS. 

Example  Figure  3.1  contain*  an  example  of  a  wgmmtaticn  used 
to  generate  the  histograms  in  figure  3.3,  3.3  and  3.4.  Regions  in 
3  segmentation*  of  similar  images  were  selected  and  identified 
■s  foliage,  grass,  sky,  shutter  and  roof.  The  features  of  raw-blue 
mean,  short  line  density,  edge  density,  eeatrotd  row  position  sad 
excess  green  mean  were  then  histogrammed.  In  the  figures,  the 
possibility  functions  are  overlaid  on  these  histograms  and  tha-y 
are  presented  in  the  order  foliage,  grass,  sky,  shutter  and  roof. 


Flgura  S,',:  Image  Sagmantatloa 


figure  3.2:  Histogram  and  possibility  func¬ 
tion  for  adga  density 


Figure  S.S:  Histogram  and  possibility  func¬ 
tion  for  short  Iln#  density 


ftgura  3.4:  HUtogram  and  possibility  func¬ 
tion  for  cantroid  row  position 


Examining  this  example  are  can  identify  a  number  of  sources 
of  uncertainty,  first  of  all  the  statistical  information  is  inher¬ 
ently  incomplete.  It  seems  clew  that  there  is  a  much  wider 
variation  in  the  data,  even  in  the  narrow  context  in  which  these 
imag.  a  are  a  sample.  Thus  the  speciBcation  of  the  possibility 
function  will  involve  some  uncertainty.  Second  tin  set  is  also 
inherently  incomplete.  We  simply  may  not  hare  anticipated  all 
the  objects  we  will  encounter  in  a  specific  context.  These  uncer¬ 
tainties  translate  into  a  requirement  that  the  system  to  be  able 
to  say  */  don't  know’  in  at  least  two  important  ways. 

First,  an  individual  knowledge  source  may  be  uncertain  about 
the  applicabil'ty  or  relevance  of  a  measurement  both  with  re¬ 
spect  to  the  information  in  the  knowledge  base  sod  with  re¬ 
spect  to  its  own  internal  processes.  In  other  words  an  individual 
knowledge  source  needs  the  ability  to  say  */  don't  know*.  (This 
is  separate  Grom  the  fact  that  the  value  supplied  by  the  knowl¬ 
edge  source  might  be  in  error.) 

Secondly,  ern  if  eaeb  individual  knowledge  source  is  com¬ 
pletely  certain  about  how  its  information  should  be  interpreted, 
there  may  be  conflict  and  inconsistency  between  the  knowledge 
sources.  In  other  words  we  need  the  ability  for  the  knowledge 
sources  to  collectively  say:  “I  dr  n’t  know”.  This  type  of  incon¬ 
sistency  can  come  in  a  number  of  forms.  For  example  it  may 
be  that  20  knowledge  sources  are  all  supplying  information  at 
a  given  time  and  one  of  the  knowledge  sources  being  in  con¬ 
flict  with  the  other  13  is  not  to  be  interpreted  as  completely 
invalidating  the  outcome  of  the  combination  process.  We  need 
s  measure  of  the  degree  to  which  the  knowledge  sources  are  col¬ 
lects  ely  uncertain  about  the  consensus  opinion.  What  action  to 
be  taken  as  a  function  of  this  uncertainty  is  yet  another  matter. 


‘"Sat. 


334 


4.  Converting 
Measurements  into 
Mass  Functions 

We  are  now  in  a  position  to  define  precisely  what  we  mean  by  a 
frame  of  discernment.  A  frame  is  designed  to  capture  the  rela¬ 
tionships  between  the  objects  in  some  context  and  the  feature* 
in  that  context  which  pertain  to  reasoning  shout  those  objects. 
As  the  context  changes,  the  objects,  the  feature*  aid  the  rela¬ 
tionship*  between  the  feature*  and  the  objects  can  be  expected 
to  change. 

Dsdnlilon:  A  knowledge  sourc*  is  a  function 
is  :  FS  -  M( 2®) 

where  M(2®)  is  the  set  of  \U  mast  functions  on  6. 

Definition:  A  frame  of  discernment  (or  a  context)  is  a 
specification  of  a  set  9  and  a  collection  of  knowledge  sources 

kii.FSi  ->  Af(2®),...  ktm-.FSK  -M(2e). 

Each  feature  space  can  be  thought  of  as  containing  quantities 
that  are  associated  with  tome  observable  and  quantifiable  aspect 
of  the  knowledge  we  are  bringing  to  bear  on  the  problem  of 
answering  the  question  which  the  frame  is  designed  to  answer. 
The  set  of  all  feature  spaces  of  poten*'  interest  -ovd  their  forms 
a  frame  of  discernment.  In  general  this  includes  any  aspect  of 
a  domain  or  world  about  which  information  may  be  obtained  in 
order  to  help  decide  which  answer  is  correct.  In  our  approach  to 
reasoning  about  one’s  environment,  various  types  of  knowledge 
source*  provide  the  partially  processed  information,  based  on 
their  environmental  observations,  about  the  ‘evidence  for*  or 
“belief  in’  the  propositions  represented  by  the  subsets  of  8. 

Suppose  we  are  given  a  set  0  and  for  each  a  6  0  a  possibility 
function.  We  will  now  give  a  construction  which  generates  a 
knowledge  source  form  this  possibility  function  rj  fact  there  are 
a  number  of  ways  to  modify  this  construction  -vhich  also  yield 
mass  functions  which  may  in  fact  turn  out  to  be  very  useful  in 
some  circumstances.  Thus  the  construction  we  give  here  is  a 
starting  point  for  a  whole  class  of  constructions.  The  advantage 
of  the  method  we  describe  is  that  it  allows  the  computation  of 
the  supports  and  plausibilities  for  the  elements  of  0  without  the 
need  of  the  power  set.  Moreover  we  will  describe  a  combination 
function  which  yields  the  came  result  as  Dempster’s  rule  but  is 
linear  with  respect  to  ‘be  number  of  elements  of  0.  This  will 
lecd  to  a  decision  rule  based  on  the  supports  and  plausibilities 
whose  complexity  is  linear  in  the  number  of  elements  of  6. 

Definition  Suppose  we  are  given  a  set  0  and  for  each  a  6  0  a 
possibility  function 

pt(f\a)-.FS  -.[0,1], 

from  some  feature  space  FS  to  [0, 1],  Then  for  each  A  £  0 
define 


I  /)  =  II  P*(/- 1  °)  II  (1-f*(/l  r*D- 

m€A  ate-A 

Now  the  function  mo(A  |  /)  is  not  a  m*-  function  since 

/)■  iiu -*•(/!•)) 

•€« 

is  not  -jccvssarily  equal  to  zero.  However  the  function  does  have 
the  other  two  properties  of  a  mass  function: 

1.  0  <  mo(A  |/)<1, 

The  first  statement  is  obvious  and  the  following  lemma  directly 
implies  the  second. 

Lemma  4.1  1/  (x i, ...,**)  a  a  lequence  of  numbert  and  N  = 
{1 . "}  tken 

e  n*  n 

j£/V-A 

whert  we  define 

ii****1 

* 

Consider  what  it  means  for  the  empty  set  tc  receive  a  non¬ 
zero  value  in  terms  of  the  possibility  functions  generating  mg. 
It  means  simply  that  the  consensus  of  opinion  of  tbe  possibility 
functions  is  that  the  feature  value  in  question  rules  out  every 
element  of  0  with  respect  to  the  current  state  oi  the  knowledge 
base  (as  represented  by  the  possibility  functions).  This  could  be 
either  because  the  knowledge  source  is  in  error  or  the  knowledge 
base  is  incomplete.  The  decision  as  to  which  of  these  conditions 
holds  is  external  to  the  processes  of  the  inference  network.  All 
that  should  be  required  of  it  is  that  it  return  (partially)  the 
answer  vat  now* 

Therefore  we  add  to  0  a  new  element  uni  and  define 

m(A  U  {unt}  j  f)  =  mo(A  |  /). 

This  then,  is  a  mass  function  on  0  U  (uni) 

We  eouJd  nave  recast  ou.  definition  by  defining  a  new  ob¬ 
ject  (a  pre-mass  function?)  which  is  allowed  to  assign  non- zero 
mast  to  the  empty  set  (see  Hummel  (1985]).  Dempster’s  rule  can 
be  defined  for  these  objects  (just  take  out  the  re-normalization) 
and  there  is  s  simple  mapping  between  these  objects  and  mast 
functions.  However,  this  approach  requires  doubling  the  nota¬ 
tion.  The  addition  of  (uni)  require*  no  change  in  notation  or 
conceptualization,  eliminates  the  need  to  re-norm alize  until  it 
is  appropriate  and  the  ‘conflict*  value  generated  by  Dempster's 
rule  is  simply  the  mass  assigned  to  (uni). 

The  following  theorem  summarizes  some  of  the  relationships 
between  the  possibility  values  and  the  generated  mass  function. 

Theorem  4.1  Suppose  we  are  giren  a  set  9  and  for  eoeh  a  t  0 
a  pottibiiitp  function 

P.(/  I  a)  :  FS  -» [0, 1], 

from  some  feature  tpaee  FS  to  [0,  i|.  Then  definir.]  the  mass 
function  m(A  U  {’ink}  \  f)  at  abort, 


335 


1  * 


j*  ~*r 


v  2-  rrrfvra'  j 


w-.'V*  J"  '■*  azu 


1.  ipt(A)  =  l>  1/ uni  g  A, 

£.  pfe({a})  =  pe(/  |  a)  fo t  wty  «ee, 

i.  *pt({a,uni)  =  n^ad-H/  I  *))  +  P*(/  I  “ 

p*(/  |  *))  for  any  a  6  8, 

pf»(A  U  {uni})  *  1  Jot  any  A  C  8. 


Given  a  man  function  as  defined  above,  we  can  define  a 
function  on  J®  by  the  formula 


m  -  nom(A  \  f) 


m(AU  {uni}  |  /) 
1  -  m({«ni}  |  /) 


for  any  non-*mpty  act  A  and  n»  -  norm(4  |  J)  “  0. 


Thaoram  4.3  m  -  norm  it  a  mom  Inaction  on  Is . 


ept({a}j  = 


f»(/l«)  11  U-p*(/!M) 

»*(•-(«)) 


1-  n<* -?*(/!•» 

•£« 


Thus  we  Lave  shewn  the  connection  between  poaaibility  fun c- 
tiooa  and  maaa  functions.  The  laet  theorem  in  particular  ahowa 
that  with  reapect  to  the  elements  of  6  the  aupporta  and  plauai- 
bilitica  can  be  compute  i  directly  from  the  poaaibility  functiona 
without  the  need  of  the  power  act  Thua  any  deciaaca  rule  baaed 
o>»  the  support  and  planaibility  haa  a  complexity  which  ia  propor¬ 
tional  to  the  number  of  dement*  of  8  (aee  Wealey  and  Happen 
[19M]). 

The  nest  two  thecreme  abow  that  term-wiae  product  of  poa¬ 
aibility  functiona  ia  equivalent  to  combining  many  Dempater'a 
rule.  We  neat  obaerve  'hat  every  maaa  fanction  generated  by  a 
poaaibility  function  ia  a  aeparable  maaa  functi>m. 


5.  Possibility  functions 
and  Dempster’s  rule 


Thaoram  IJ  Support  we  aee  giotn  met  8  rdfnr  racks  £  8 
a  pottHiiity  function 

p*(/|a):/S-[0,l|. 

jrom  tome  feature  space  F*  to  |0, 1|.  Then  tie  matt  function 
m  -  norm  generated  ly  Out  uoteihiiitp  function  it  a  separable 
mots  function. 


In  tbia  aectiou  wt  make  the  connection  between  poaaibility  func¬ 
tiona  and  Dempater’a  ode.  In  paetier'ar  we  will  abow  that  com¬ 
bining  poaaibility  functiona  by  term-wiae  product  and  generating 
a  mam  function  yielda  the  aamc  reault  aa  individually  generating 
maaa  function;  and  combining  uaing  Dempster’i  rule. 

Let  ua  briefly  consider  aome  reaulta  which  it  useful  lor  de¬ 
riving  computaticnally  efficent  wava  of  computing  Deu pater  i 
Pile,  aupporte  and  plauaibilitiea.  The  firat  obw-noboo  ia  that 
if  we  are  prevented  with  a  act  of  maaa  function*  cm  8  U  {uni) 
tA  combine,  and  they  only  assign  noo-iero  mwe  to  subset*  of 
8  U  {uni}  which  contain  uni,  then  the  amount  of  maaa  which 
accumulaiea  on  {uni}  ia  exactly  the  conflict  value  of  the  n-wiae 
combination  w  dtfiaed  by  Sba/rr. 

Next  we  obaerre  that  ooemaliratkm  and  combination  com¬ 
mute  with  each  other. 


TVaoawn  IA  Suppotc  we  aev  (ian  1  <rt  8  naf/erndeC  8 
pottthtiiif  function! 

P*t(f  \n).F3-*  |0,I|, 

an 4 

ye,(/|<.):/5-.[0,l|, 
from  tome  feature  eyaee  TS  to  [0, 1).  Let 

•«»(/  I  •)  -  f«i(/  I  •)  ’  Ht{f  I  •) 

Ihen  define. to  tie  mote  function!  mi, my  and  «|  in  term'  of 
then  poieiUktg  functions  at  aioae, 

*  *1  9  «**. 


Thaoram  S.l  Suppotc  m((AU{uni}  |  /)  and  mj(AU  {uni}  | 
/)  are  matt  function 1  dented  from  tie  poininktp  funettom  p*  1  ( /  ) 
a)i  and  p*i(/  |  a)  Tien 

(mi  9  ni|  -  norm  «  mi  -  norm  9  mj  -  nrrm 


6.  A  preliminarj' 
experiment 


Thaoram  S.3  Suppose  m(A  U  {uni))  to  a  matt  fumtiom  de¬ 
nted  from  the  ponibiiitg  Junction  pt  Then  rnith  rttpeet  to  the 
matt  function  m  -  norm, 


pi*({a)) 


pc(/  |  a) _ 

1  -  II'1  “H/  I  «)) 

•ee 


We  now  promt  a  diaccseion  of  aome  preliminary  reaulta  of  inter¬ 
preting  outdoor  houar  Kenra.  In  thia  rxpenmmt,  the  queatioo 
aaied  by  ouf  eystem  ia  Wii-i  rtjtont  of  thu  segmentation  re  pre¬ 
fect  fnit age,  frail,  iky.  ihuttcr.  and  rooft  To  aaawrr  this  qura- 
ticj  a  frame  of  diacemmcut  waa  developed  hrurtatiealiy  baaed  on 
the  biatograma  of  aeveral  feature  ipacea  for  hand  picked  regions 
from  similar  segmentations. 


Pint  w e  will  outline  the  derision  rale  which  nt  used  to 
interpret  the  image.  Suppose  we  we  green  a  saw  function  U 
generated  (say)  by  an  evidence  combination  praceaa  deacribcd 
in  the  prerion*  sections  tn  thi*  paper  we  will  not  develop  a 
decision  theory  baaed  on  the  SPT  and  PLS ,  however  ate  will 
state  the  simple  criterion  r<hith  was  used  in  the  interpretation 
experiment. 

Definition  Given  a  maas  function  hi.  I*  —  (0, 1],  the  dwdalvw- 
noM  of  each  Ac  2®  is  defined  so 

DEC{A )  »  SFT{A)  -  (1  -  /'15(d)). 

Note  that  DEC(A)  is  a  number  between  -1  and  1.  If  the 
5s'?(d)  is  eloae  to  1  then  VEC(A)  is  cloae  to  1  and  at  the 
b  •*»»  ’«  0  then  DEC{A)  is  eloae  to  -1,  and  in  par¬ 
ticular  it  SPT(A)  «  0  and  f*L5(AJ  »  1  turn  umCyA)  -  u. 
Thus  if  DEC(A)  is  eloae  to  1  then  thr  evidence  supports  A,  it 
DEC(A)  it  eloae  to  -1,  the  evidence  trade  to  refute  A,  and  if 
DEC[A)  is  eloae  to  0  then  the  evidence  is  indeciaive.  One  simple 
decision  cntrrioo  then  is  to  compute  the  decisiveness  fag  each 
emgtfton  of  6  and  take  the  element  which  hsa  the  maximum 
value  That  is  select  a  €  ©  where 

DSCU*))  -  m**(0ffC({*})  |  a  £  6} 

This  measure  preserves  the  ordering  implied  by  the  possibility 
values,  and  can  be  used  even  when  the  possibility  —slues  are 
unavailable. 

Now  eensider  the  situation  whet*  Cor  a  given  feature  value 
to  bt  interpreted  by  a  particular  knowledge  source,  the  bum 
assigned  to  the  unknown  object  is  not  equal  to  aero.  When 
this  happens  we  say  the  knowledge  has  internal  conflict  and 
this  value  ran  be  viewed  as  the  “uncertainty"  of  the  knowledge 
source.  This  arises  when  every  object  is  given  a  possibility  value 
less  than  1  by  the  knowledge  source.  For  example  given  the 
possibility  functions  in  figurv  (.2  for  edge-density,  if  the  feature 
value  returned  is  above  5$,  then  the  knowledge  source  will  in 
effect  say,  * Pot  (Ass  repos  there  is  no  object  defined.  *  and  assign 
10  maas  to  the  singleton  (unknown).  The  amount  of  internal 
conflict  can  be  calculated  directly  from  the  possibility  function 
by  the  following  formula. 

*  -  i  />  -  n  (*  -  h/  i  •» 

•w 

While  combining  the  evidence  provided  from  all  knowledge 
sources  the  uncertainty  of  a  single  knowledge  source  will  be 
also  reflected  in  the  combined  conflict.  Intuitively,  it  seems  rea¬ 
sonable  that  the  "conflict"  created  by  an  individual  knowledge 
source  should  however  be  distinguished  bom  the  conflict  creased 
by  the  consensus  of  knowledge  sources. 

One  simple  method  for  dealing  with  undefined  knowledge 
is  to  simply  remove  any  knowledge  source  with  t  significantly 
large  internal  conflict.  Unfortunately  this  presents  the  problem 
of  determining  an  appropriate  threshold  for  the  internal  conflict 
and  all  information  contained  in  that  knowledge  source  is  lost 
by  thin  process  when  tbe  conflict  is  above  that  threshold. 

One  approach  to  tbe  problem  stated  eboee  is  to  condition 
each  possibility  function  by  tbe  degree  to  which  its  knowledge  is 
defined  I’cforr  the  combination  process.  Such  a  conditioning  is 
obtained  by  using  the  pi  visibility  of  the  mass  function  m-ru>rm: 


With  i  aa  defined  above  (aee  Theorem  6.1). 

Tbe  “weighting”  process  allows  a  possibility  function  with 
total  uncertainty  (for  example  ((aO.O),  (10.0)) )  to  have  no  effect 
uu  ujc  museums  of  opinion,  and  a  possibility  function  with  no 
uncertainty  to  have  full  effect  on  the  consrasus.  The  amount  of 
weight  given  to  a  possibility  function  is  a  continuous  function 
of  the  internal  conflict  in  the  knowledge  source.  Although  this 

approach  drab  partially  with  the  problem  discussed  above,  a 
complete  decision  rale  must  be  a  function  of  tbe  sternal  conflicts 
of  the  knowledge  sources,  tbe  combined  conflict,  the  support  and 
plausibility. 

For  the  experiment  presented  here.  aC  knowledge  sources  are 
assumed  to  be  independent  and  we  are  most  interested  in  a  de¬ 
cades  baaed  on  the  conflict  created  between  knowledge  sources. 
Itraa,  m  ton  -net  we  nave  cnosea  to  bate  -hr  dec  owe  rule  on  the 
combined  conflict  and  the  declare raess,  diminishing  the  effect  of 
the  internal  conflict  in  the  manner  described  above.  This  leads 
to  the  following  dcctriesi  used  for  interpret  mg  each  region  of  the 
image: 

1.  Run  each  knowledge  source  far  a  region  and  return  a  list 
of  possibility  valises  defined  over  6. 

2.  Re- normalise  tbe  possibility  lists  by  the  conflict  internal 
to  each  knowledge  uource. 

3.  Combine  tbe  possibility  lists  (via  the  multiplicative  paral¬ 
lel  to  Dempster's  Role)  and  convert  the  result  into  a  mass 
function  defined  over  2*. 

4.  Re^ normalise  the  resulting  sasss  functiao  by  the  combined 
conflict. 

i.  Label  each  region  baaed  on  tbe  combined  conflict  measure 
and  tbe  decisrvcneaa  values  defined  over  the  singleton  sub¬ 
sets  of  0. 

If  the  combined  conflict  was  above  some  threshold  C  (in  this 
rase  0.26)  then  the  final  labeling  for  that  region  is  unxnown. 
Otherwise  tbe  final  label  was  chosen  by  computing  the  deri- 
si'ene*  values  for  each  sin  girt  on.  The  singleton  with  largest 
decisiveness  value  or  the  subset  containing  tbe  two  or  more  sin¬ 
gletons  w.th  the  same  largest  derisiveness  value  was  chosen  as 
tbe  label  for  the  region  under  discrimination.  Tbe  rer-lts  art 
shown  in  figures  6.1  thru  6.9  where  tbe  regions  are  labeled  to- 
Isagr,  grass,  /ofiage  or  grass,  shatter,  /ofiage  or  ekvttee),  too /, 
/ullage  ot  roof,  sky,  and  snknowta. 

7.  Conclusions 


In  this  paper  we  have  considered  mast  functions  generated  from 
possibility  functions  defined  from  the  a  priori  statistics  of  fea¬ 
tures  and  objects.  It  is  obvious  that  tbe  marc  assignments  gen¬ 
erated  from  possibility  functions  bear  a  great  resemblance  to 
probajilitirs  on  seta  of  lodrpeudrri  events.  For  the  examples 
given,  this  form  is  intuitierly  appealing  aa  well  as  compact  and 
easily  analyzed.  However,  not  all  relationships  between  image 
features  and  their  interpretations  ran  be  captured  by  the  use  of 
possibility  functions  in  tbe  way  that  we  bare  defined  tbe  rela¬ 
tionship  between  possibility  functions  and  mass  functions.  We 
will  conclude  by  considering  two  situations  where  this  occen. 


Pint,  a  coarsening  or  refinement  of  the  frame  of  diet  eminent 
iii tj  i>.  inquired.  !n  Uj.*  :w  the  mast  faocstou  on  the  refiae- 
ment  can  not  necessarily  be  generated  by  a  possibility  function 
over  the  refinement.  For  exampie,  in  the  context  of  aerial  pho¬ 
tographs,  a  tum.nue  of  rectangularly  may  discern  between  rect¬ 
angular  and  non- rectangular  objects  but  it  is  not  appropriate  to 
use  this  measure  to  distinguish  between  potentially  rectangular 
objects  (such  as  building*  or  parking-lots).  Therefore  a  single 
frame  of  discernment  and  related  knowledge  sources  can  not  be 
uaed  throughout  the  reasoning  process  and  the  system  must  be 
able  m arrange  mats  functions  of  broader  ty^ea  than  mention'd 
here. 

Shafer  [•“?•}  ^gRr*u  that  in  situations  where  the  combina¬ 
tion  of  mas*  functions  produces  a  great  deal  of  conflict  the  indi¬ 
vidual  mass  functions  can  be  discounted  then  recombined.  An 
example  of  this  is  uniform  discounting  which  reduces  the  mass 
given  to  each  proper  subact  and  inrreaae*  the  mass  given  to  ®. 
If  the  discounting  factor  and  conflict  is  Urge  enough  then  the 
combined  mats  to  each  proper  subset  tends  toward  en  "average* 
of  individual  mass  functions.  The  (Uncounted  mass  function  it 
not  necessarily  separable  into  the  simple  mass  functions  of  the 
form  described  above,  and  thus  the  iinalysU  using  possibility 
functions  doe*  not  apply. 

Thu*  mas*  functions  generated  by  possibility  functions  form 
only  a  proper  subset  of  the  mass  functions  which  are  sppiieiabie 
in  a  general  image  understanding  system,  however  their  simplic¬ 
ity  and  computational  advantages  make  them  very  at t active  in 
contexts  where  evidential  reatoning  and  management  a  uncer¬ 
tainty  it  required. 

Acknowledgements 

We  would  like  to  thank  Len  Wesley.  A1  Hanson  and  Le* 
Kitchen  for  many  invahible  conversation*  concerning  this  paper. 
We  would  especially  like  to  thank  Lea  for  suggesting  that  we  con¬ 
sider  the  combination  rule  for  possibility  functions.  The  work 
reported  here  was  supported  by  the  Air  Force  under  AFOSR 
contract  no.  F49620-83-C-0099 

References 

A.  P.  Dempster  (1968),  "A  generalisation  of  Bayesian  inference*, 
founts!  of  the  Royal  Statu  tic  al  Society,  Series  B,  vol.  30,  1968, 
pp.  205-247. 

A.  Hanson,  E.  Risemaa,  J.  Griffith,  T.  E.  Weymouth  (1984),  *A 
methodology  for  the  development  of  general  knowledge-baaed 
vision  systems*  ,Proe.  IEEE  Workshop  on  Principles  of  Knowl¬ 
edge  Based  Systems,  Denver,  Colorado,  Dee.  1984,  pp.  1S9-1T0. 

R.  A.  Hummel  (198S),  *A  viewpoint  on  the  theory  of  evidence*, 
to  appear. 

J.  D.  Lowrancr  (1982),  ’Dependency-graph  models  of  evidential 
support*,  Fh.D.  Thesis,  University  of  Massachusetts,  Amherst, 
1982. 

S.  Y.  Lu,  H.  E.  Strphanou  (1984),  *A  set-theoretic  framework 
for  the  processing  of  uncertain  knowledgr’,  AAAI-84,  pp.  216- 
221. 

G.  Shafer  (1976),  A  Mathematical  Theory  of  Evidence ,  Princeton 
University  Press,  1976. 


T.  St  rat  (1984),  ‘Coe’imwa*  belief  functions  for  evidential  ,-en- 
sening’,  .AAAI-34,  pci.  308-313. 

L  wc?k-y  (19RS),  * Reaec»» fsg  dbctrt  control;  the  investigation 
of  an  evidential  approach*,  Proe.  Eighth  International  Join-' 
Conference  on  Artificial  Intelligent*,  Karlsruhe,  West  Germany, 
August  1983,  pp.  233-210. 

L.  Wesley  (1984)  ‘Reasoning  about  control:  aa  evidential  ap¬ 
proach*,  Tech.  Report  374.  snj  Artificial  inteihgmrr  Center, 
loss. 

L.  Wesley  (1988),  Ph-D.  Then*,  University  of  Mamachuartts, 
Amherst,  m  preparation. 

L.  Wesley,  A.  Hanson  (1988),  *Thr  application  of  an  evidential- 
bused  technology  to  a  high-level  knowledge-baaed  image  inter¬ 
pretation  system*,  to  appear. 


Figure  4.7  Follaga  or  Shutter 


Ftgura  4.4  FolUga  or  Gru* 


Figure  4.0  FoUaga  or  Roof 


BINOCULAR  IMAGE  FLOWS 


Alien  M.  Waxman 

Computer  Vision  Laboratory 
Center  for  Automation  Research 
University  of  Maryland 
College  Park,  MD  20742 

ABSTRACT 

The  analyses  of  visual  data  by  stereo  and  motion 
modules  have  typically  been  treated  as  separate,  parallel 
processes  which  both  feed  a  common  viewer-centered 
2.5-D  sketch  of  the  scene.  When  acting  separately,  stereo 
and  motion  analyses  are  subject  to  certain  inherent 
difficulties.  Stereo  must  resolve  a  combinatorial 
correspondence  problem  and  is  further  complicated  by 
the  presence  of  occ'uding  boundaries,  while  motion 
analysis  involves  the  solution  of  nonlinear  equations  and 
yields  a  3-D  interp*etation  specified  up  to  an  undeter- 
r  lined  scale  factor.  A  new  module  is  described  here 
which  unifies  stereo  and  motion  analysis  in  a  manner  in 
which  each  helps  to  overcome  the  other's  shortcomings. 
One  important  result  is  a  eonelotion  between  relative 
image  flow  (i.e  ,  binocular  difference  flow)  and  stereo 
disparity;  it  points  to  the  importance  oi  the  ratio  S/S,  rate 
of  change  of  disparity  6  to  disparity  S.  and  its  possible 
role  in  esi ablishing  stereo  correspondence.  The  impor¬ 
tance  of  such  ratios  was  first  pointed  out  by  Richards 
f 1 083).  Our  formulation  may  relied  the  human  percep- 
t.on  channel  probed  by  Regan  and  Beverley  (1979). 

1.  INTRODUCTION 

In  decomposing  the  visual  information  processing 
task  into  several  stages,  it  is  the  intermediate  level  which 
is  responsible  for  the  recovery  of  surface  shapes  in  a  scene 
(Marr  1082).  It  is  ofl-n  described  as  a  set  of  “shape 
from”  modules  which,  acting  independently  and  in  paral¬ 
lel,  feed  a  viewer  centered  '2.5-D  sketch  of  the  visual 
field.  Two  of  the  most  commonly  studied  and  closely 
related  modules  are  shape  from  stereo  (Koenderink  and 
van  Doom  1 07 ft:  Marr  and  Pugg'o  1979:  Slayiiew  and 
Frisby  1981;  Rrazdnv  198-1:  Pollard  et  al.  198.',;  Kastman 
anil  Waxman  1985)  and  shape  from  monocular  motion 
(Koenderink  rad  van  Doom  1975;  I  liman  1979:  Pruzdnv 
1980;  Longuet-Higgii’.s  and  Prazdny  1980;  l/inguet- 
Higgins  1981;  Tsai  and  Huang  198la.li;  Waxman  and  I  li¬ 
man  1983;  Waxman  1981;  Waxman  ami  Wolin  1981; 
Wo  bn  and  Waxman  1985a.  b;  Subbarui  and  Waxman 
1985:  Buxton  et  al..  I98||.  However,  when  acting 
independently.  *ae h  of  these  process.-,  sufier-s  from  certain 

•  Present  address:  Thinking  Machines  <  'orporntion. 

2  1 5  h  irst  Street,  <  atnbridge.  M.\  021  12 


James  H.  Duncan 

Flow  Research  Company 
Silver  Spring,  MD  20910 


inherent  difficulties.  Stereo  is  faced  with  a  combinatorial 
correspondence  problem  plagued  by  the  presence  of 
occlud'Dg  boundaries  (Grimson  1981;  Poggio  and  Poggio 
1984),  while  motion  analysis  involves  the  solution  of  non¬ 
linear  equations  and  leaves  the  3-D  interpretation 
specified  up  to  an  arbitrary  scale  factor  (Waxman  and 
Ullman  1983).  There  is  evidence,  however,  for  a  separate 
channel  of  human  visual  processing  in  which  stereo  and 
motion  analyses  may  come  together  much  earlier  than  at 
the  2.S-D  sketch.  We  formulate  here  a  theory  of  time- 
varying  stereo  in  the  context  of  “binocular  image  Bows," 
where  stereo  and  motion  work  closely  in  order  tc  over- 
corn*  each  other's  shortcomings.  Central  to  our  approach 
is  tbe  notion  of  relative  flow  (or  “binocular  difference 
fiow"),  representing  the  difference  between  image  veloci¬ 
ties  of  a  feature  as  seen  in  the  left  and  right  images 
separately.  Neural  organizations  which  perform  this 
“computation"  have  already  been  proposed  (Regan  and 
Beverley  1C79). 

The  fu:ion  of  stereo  and  motion  into  a  single  module 
has  been  considered  recently  by  others  as  well.  Richards 
(1983)  demonstrated  recovery  of  structure  from  ortho¬ 
graphic  stereo  and  motion  without  knowledge  of  the 
fixation  distance.  He  also  pointed  out  the  importance  of 
measuring  changes  in  quantities,  such  as  disparity  over 
time,  relative  to  their  current  values.  Jenkin  (1984)  con¬ 
sidered  a  stereo  matching  process  driven  by  the  3-D 
interpretation  of  feature  point  velocities.  Waxman  and 
Sinha  (1984)  proposed  a  “dynamic  stereo”  technique 
based  upon  the  relative  low  derived  from  two  cameras  in 
known  relative  motion,  valid  in  the  limit  of  negligible 
disparity.  The  question  of  image  motion  aiding  stereo  in 
the  matching  process  was  noted  by  Pcggio  and  Poggio 
(198-J);  and  ss  will  be  shown  below,  a  correlation  between 
binocular  difference  How  and  disparity  may  support  this 
possibifitv. 

We  suggest  a  decomposition  of  our  stereo-motion 
module  into  five  steps  whii  h  begins  w  here  low-level  vision 
eruLs,  i.e.,  it  follows  the  stag.-  of  edge  and  point  feat  ure 
extraction  (ami  trucking  over  time)  In  the  left  and  right 
images  separately. 

Step  I  Monocular  image  Row  recovery  and  How  segmen¬ 
tation  of  the  separate  left  and  right  image  sequence*,  util¬ 
izing  the  tWorify  Fun'tional  Method  (Waxman  and 


349 


Wohn  1984)  and  overlap  compatibility  (Waxman  1984; 
Wohn  and  Waxman  1985b).  This  procedure  allows  gross 
correspondence  to  bt  established  between  analytic  flow 
regions  in  the  left  and  right  images.  It  also  reveals  the 
depth  and  orientation  discontinue  es  that  often  plague 
stereo  matching  and  surface  reconsti  action  algorithms. 
Step  2:  Establishing  correspondence  between  (previously 
unmatched)  left  and  rignt  image  features  according  to  a 
correlation  between  binocular  difference  flow  and  stereo 
disparity.  This  process  can  be  implemented  in  parallel 
over  the  binocular  field  of  view  in  the  context  of  "local 
support”  within  neighborhoods  (Praidny  1981;  Pollard  et 
al.  1985;  Eastman  and  Waxm&n  1985)  This  correlation 
points  to  the  importance  of  the  ratio  6/6,  rate  of  change 
of  disparity  6  to  disparity  6.  The  importance  of  this  rato 
was  first  noted  by  Richards  (1983).  A  "rigidity  assump¬ 
tion”  for  independently  moving  objects  in  the  scene  also 
enters  here. 

Step  S:  Use  of  disparity  functionals  defined  in  overlapping 
neighborhoods  to  recover  smooth  surface  structure 
between  the  discontinuities  detected  from  the  monocular 
flow  analyses  (Koenderink  and  van  Doom  1976;  Eastman 
and  Waxraan  1985). 

Step  4  Recovery  of  rigid  body  space  motions  correspond¬ 
ing  to  separate  analytic  flow  regions  utilizing  the  deter¬ 
mined  surface  structure  and  either  monocular  image  flow 
(or  a  cyclopean  image  flow).  Separate  surface  patches, 
can  then  be  grouped  into  rigid  objects  sharing  the  same 
space  motions.  This  process  entails  solving  only  linear 
equations  as  a  measure  of  its  complexity. 

Step  5  Use  of  the  separate  image  flows  to  track  features 
and  discontinuities  over  time.  This  allows  refinement  of 
disparity  estimates  to  ”sul>-pixel"  accu-acy  by  temporal 
interpolation.  It  also  allows  the  match mp  process  to 
locus  attention  onto  areas  where  new  image  features  will 
be  unveiled  and  old  ones  will  disappear,  i  e,,  <i t  the 
disconhnuittec  and  periphery  of  the  field  of  pier 

This  last  step  suggests  that,  in  .he  analysis  of  a 
time-varying  stereo  sequence,  once  an  initial  correspon¬ 
dence  has  been  determined  between  left  and  right  images, 
it  is  not  necessary  to  establish  correspondence  anew  for 
tnc  entire  image  pair  at  subsequent  limes.  Most  of  the 
image  features  merely  (low  to  new  locations  which  can  be 
predicted.  Matching  need  only  be  performed  on  new 
features  which  "titer  tile  visible  field  from  the  periphery 
and  from  behind  u-cluding  boundaries. 

In  this  paper  we  formulate  sev*ra)  of  these  steps 
toward  -tereo- motion  fusion.  Section  2  reviews  the  basic 
monocular  image  ilow  relations  for  rigid  bodies  in  motion. 
The  importance  of  lor, ally  second-order  Hows  ami  boun¬ 
daries  of  analyt'eity  (i.e.,  weak  and  strong  flow  discon¬ 
tinuities)  is  stressed  as  it  is  important  for  the  binocular 
Mow  analysis  that  follows.  In  Section  T  v;e  develop  the 
theory  of  binocular  image  flows  in  the  context  of  a  paral¬ 
lel  -lereo  configuration,  imaging  a  scene  of  rigid  objects  in 
motion.  A  correlation  is  derived  between  relative  flow 
(binocular  difference  flow)  and  stereo  disparity,  laying  the 
basis  for  a  new  kind  of  matching  procedure.  In  Section  1 


we  utilize  an  experimental  data  set  for  a  short  stereo 
sequence  to  obtain  the  measured  binocular  image  flows  at 
one  time  'aslant.  These  flows  are  then  filtered  using  the 
Velocity  Functional  Method,  and  a  flow  segmentation  is 
derived  in  order  to  detect  depth  and  orientation  discon¬ 
tinuities  in  the  scene.  This  data  is  then  used  to  confirm 
the  correlation  betveen  binocular  flow  and  disparity 
developed  earlier.  V  e  conclude  in  Section  5  with  a  dis¬ 
cussion  of  what  remains  to  be  done  in  the  construction  of 
a  complete  stereo-motion  fusion  module. 


a.  MONOCULAR  IMAGE  FLOWS 


Investigations  into  the  recovery  of  3-D  structure  and 
motion-  from  time-varying  monocular  imagery  have  pro¬ 
ceeded  along  two  rather  distinct  paths.  One  approach 
has  been  concerned  with  the  motion  of  discrete  points 
moving  rigidly  in  space  (Ullman  1979;  Praidny  1980; 
Longuei-Higgins  1981;  Tsai  and  Huang  1981a,b;  Adiv 
1984).  The  resulting  3-D  inte*pretation  b  in  the  form  of 
rigid  body  motion  parameters  and  relative  depth  of 
poicts  in  space.  The  second  approach  treats  the  image 
flow  field  as  a  whole  (Koenderink  and  van  Doom  1976; 
Longuet-Higgios  and  Praidny  1980;  Waxman  find  Ullman 
1983;  Wohn  1934;  Wasman  and  Wobn  1985)  in  an 
attempt  to  recover  the  rigid  body  motion  parameters  and 
surface  descriptions  (slopes  and  curvatures)  of  entire  sur¬ 
face  patches.  Recently,  work  has  begun  on  the  3-D 
recovery  of  structure  from  non-rigid  body  motions  (Ull¬ 
man  1983;  Koenderink,  private  communication).  Our  for¬ 
mulation  of  binocular  image  flows  will  follow  the  continu¬ 
ous  field  approach  developed  for  monocular  flows  gen¬ 
erated  by  textu-ed  objects  in  rigid  body  motion  (Waxman 
and  Ullman  K83,  Waxmar.  1984,  Waxman  and  Wohn 
1984,  Wohn  1984). 

We  consider  a  scene  as  comprised  of  objects  in 
independent  rigid  body  motion  with  respect  to  the 
observer.  The  individual  objects  are  imagined  as  decom¬ 
posed  into  surface  patches  visible  to  the  observer,  and 
these  surface  patches  in  space  project  into  neighborhoods 
in  the  image.  It  'is  actually  the  surface  t >- v '  ire  and  shad¬ 
ing  which  is  observed  under  perspective  projection  in  the 
image.  Due  to  the  relative  motion  between  object  and 
observer,  the  projected  texture  undergoes  deformations 
which  reflect  the  image  flow  field.  The  theory  of  monocu¬ 
lar  'linage  flows,  developed  by  Waxman  and  collaborators 
(cf.  References),  provides  technique*  for  the  recovery  of 
flow  fields  and  deformation  parameters  from  evolving 
contours,  edge  fragments  and  feature  points  in  the 
imagery,  and  for  recovery  or  3-D  surface  structure  and 
rigid  body  motion  from  these  deformatioas.  As  these 
ideas  provide  the  starting  point  for  binocular  flow 
analysis,  they  arr  reviewed  in  more  detail  here. 


341 


2.1  Image  Velocity  Relations 


As  a  textured,  rigid  object  moves  through  space,  the 
evolving  image  sequence  registered  by  a  monocular 
observer  (e.g.  a  moving  pin-hole  camera)  contains  infor¬ 
mation  in  the  form  of  an  image  flow  field.  This  linage 
flow  is  determined  by  the  relative  rigid  body  motion 
between  object  and  observer,  as  well  as  the  structure  of 
tne  object's  surface  visible  to  the  observer.  Derivation  of 
this  flow  field  fo'lcws  that  of  Waxman  and  Ullman  (1983). 

We  attribute  the  relative  rigid  body  motion  to  -,i 
observer  represented  by  the  spatial  coordinate  syst- 
(X ,  Y ,  2  )  in  Figure  '.  The  origin  of  this  system 
located  at  the  vertex  of  perspective  projection,  and  ti 
if -axis  is  directed  along  the  center  of  the  instantaneou 
field  of  view.  The  instantaneous  rigid  bodv  motion  of  thi 
coordinate  system  is  specified  in  terms  of  the  transla¬ 
tional  velocity  V  =*  ( Vx  .  I'y  •  ^'z  )  °f  't*  ortR'n  and  its 
rotational  velocity  0  =»  (flY  ,  fly  ,  )•  The  2-D  image, 

sequence  is  created  by  the  perspective  projection  of  the 
object  onto  a  planar  screen  oriented  normal  to  the  Z- 
axis.  The  origin  of  the  image  coordinate  system  (x  ,  y )  on 
the  screen  is  located  in  space  at  (Jf,  Y ,  2  )  =  (0,  0,  1); 
that  is,  the  image  is  reinverted  and  scaled  to  a  focal 
length  of  unity. 

Due  to  the  observer's  motion,  a  point  P  in  space 
(located  by  position  vector  R  )  n.oves  wiih  a  relative 
velocity  U  =  -  (V  +  f)X  R  ).  At  each  instant,  point  P 
projects  onto  the  screen  as  point  p  with  coordinates 
(x,  y  )  =  (X /  Z,  Y /  Z).  The  corresponding  image  velo¬ 
cities  of  point  p  are  (vt  ,  vf  )  =  (x ,  y  ),  obtained  by 
differentiating  the  image  coordinates  with  respect  to  time 
and  utilizing  the  components  of  U  for  the  time  deriva¬ 
tives  of  the  spatial  coordinates  of  P .  The  result  in 


(la) 

>  |xy  -  (I  +  1V)UY  +  y  n2 1  , 


+  !(i  +  y2)nY  -  xy  nr  -  x  ] 

These  equations  define  an  instantaneous  image  flow 
field,  assigning  a  unique  2-D  image  velocity  v  to  each 
direction  (x,  y)  in  the  observer's  field  of  view  For  the 
moment,  we  shall  consider  only  a  single  surface  patch  of 
some  object  in  the  field  of  view.  A  sinali  but  finite  surfs,  e 
patch  may  be  locally  approximated  by  a  quadric  surface 
in  space  as  described  by  six  parameters:  two  slopes,  three 
curvatures  and  an  overall  distance  scale.  If  the  surface 
patch  is  described  in  this  viewer-centered  spatial  coordi¬ 
nate  system  hv  /  =  £(.V,  Y  ).  then  It  is  straightforward 
to  find  the  corresponding  local  ,  epresentalion 


r 


Fig.  1  •  Spatial  Coordinate*  Moving  with  a  Monocular 
Observer  and  the  Monocular  Image  Coordinates 

Z  =  Z  (x ,  y  )  as  a  second-order  polynomial  in  terms  of 
image  coordinates.  Of  these  six  surface  parameters,  only 
five  can  be  recovered  dirertly  from  the  image  flow  field; 
•he  overall  scale  f actor  %s  lost  as  tt  always  appears  in  ratio 
■if A  the  translational  velocity  V  (Waxman  and  Ullman 
>83).  Moreover,  the  remaining  five  surface  parameters 
pear  in  product  with  the  translational  space  motion, 
e  kinematic  analysis  developed  by  Waxman  and  Ull- 
b.  n  (1983)  leads  to  a  set  of  twelve  algebraic  equations 
ri  ting  this  3-D  structure  and  motion  to  derivatives 
(through  second  order)  of  the  image  flow.  Recovery  of  the 
S-l)  information  require,  solution  of  nonlinear  equations. 

2.2  Second-Order  Image  Flowa 

In  the  recovery  of  surface  structure  and  3-D  motion 
from  image  flow,  it  is  sufficient  to  describe  an  imag;  flow 
as  a  locally  second-order  flow  field.  This  hac  implications 
with  regard  to  the  surfaces  which  generate  the  flow  itself. 
For  example,  a  planar  surface  patch 
Z  =ZQ  +  pX  +  qY ,  ,nay  be  described  exactly  as 
Z  =  Z0(l  -  px  -  qy)~l  in  image  coordinates.  Substitu¬ 
tion  into  the  velocity  equations  above  yields  expressions 
in  the  form  of  second-order  polynomials.  For  planar  sur¬ 
faces,  such  second-order  flows  are  globally  valid.  On  the 
other  hand,  quadric  surfaces  generate  flows  which  are  not 
simple  polynomials  in  the  image  coordinates.  However, 
they  may  be  locally  approximated  as  second-order  flows. 
The  coefficients  of  this  second-order  flow  then  determine 
tb»  slopes  and  (scaled)  curvatures  of  the  quadric  surface 
patch  as  well  as  its  (scaled)  space  motion.  In  this  con¬ 
i'  xt,  a  complex  surface  is  viewed  as  a  composite  of  over¬ 
lapping  pianat  and  quadric  patches.  The  image  flow 
associated  wiih  a  smooth  surface  is,  therefore,  a  slowly 
varying  (in  terms  of  image  coordinates)  second-ordei  flow 
defined  over  a  region  of  the  image. 

In  order  to  recover  the  second-order  flow  approxima¬ 
tion  for  any  neighborhood  in  the  image,  it  is  necessary  to 
have  a  sufficiently  dense  te.  ture  present  in  that  neighbor¬ 
hood.  This  texture  gives  rise  to  extended  contours,  edge 
fragments  and  point  features,  all  of  which  are  converted 


34 


V 

V 

V 


s'cng  and  deformed  by  the  local  image  flow.  These 
features  serve  to  sample  components  of  the  flow  field;  in 
particular,  the  contours  and  edges  yield  an  estimate  of 
the  How  in  the  direction  normal  to  the  contours  them¬ 
selves.  The  Velocity  Functioned  Method  (Waxman  and 
Wohn  1984)  may  then  be  used  to  recover  the  local  Sow 
from  these  sampled  components. 

We  model  the  components  of  the  local  velocity  field 
by  second-order  polynomials;  hence,  define  the  partial 
derivatives  of  image  velocity  evaluated  at  a  local  origin  as 

.('•»■  .  (2) 

dx'd,>  |o 

Then  the  components  of  instantaneous  velocity  in  the 
neighborhood  arc  described  by  the  two  functionals 

,5  (3.) 

(■+/<») 

.5  fr  (3b) 


(•+><*) 

The  polynomial  coefficients  can  be  obtained  by  fitting  the 
equations  to  the  measured  velocity  at  isolated  feature 
points  or  to  the  flow  normal  to  image  contours.  Wax  man 
and  Wohn  (1984). 

2J  Boundaries  of  Analytic  ity 

From  equations  (1)  it  is  apparent  that  the  flow 
field  is  "functionally  analytic"  (i.e.  twice  differentiable) 
wherever  object  surfaces  Z  (x  ,  y  )  are  twice  differentiable. 
The  flow  is  non-anelytic  at  points  where  Z  or  its  first 
partials  are  discontinuous,  and  where  the  relative  space 
motion  parameters  change.  Such  points  occur  along 
occluding  boundaries  and  structural  edges  where  surface 
orientation  changes  abruptly  (e.g.,  the  edges  of  a 
polyhedron).  Thu.‘i,  an  image  flow  field  is  naturally  parti¬ 
tioned  into  regions  of  analyticity  separated  by  singular 
contours  (i.e.,  boundaries  of  analyticity).  These  analytic 
regions  are,  in  turn,  decomposed  into  neighborhoods  in 
which  the  image  flow  is  locally  approximated  as  & 
second-order  flow.  It  is  part  of  a  complete  image  flow 
analysis  to  delineate  these  boundaries  of  analyticity  so 
that  3-D  interpretations  can  be  assigned  to  the  regions 
within  them.  Figure  2  illustrates  this  partitioning  of  the 
image  flow  field. 

In  order  to  detect  the  presence  of  a  boundary  of 
analyticity  in  the  flow  field,  we  try  to  “analytically  con¬ 
tinue”  the  flow  from  one  neighborhood  to  the  next.  This 
is  accomplished  by  requiring  the  separate  second-order 
flow  approximations  determined  in  each  neighborhood  to 
be  “compc-tible"  in  an  overlapping  area  common  to  both 
neighborhoods  (Wohn  1984;  Wohn  and  Waxman  1985b). 
The  degree  of  compatibility  between  neighboring  flow 
approximations  is  measured  relative  to  the  agreement 


bywaf 

taalftKJtt 

1 


’WMigtk  CmiimmUm* 


— Ove-iee 
CWBMUMUIl 


fcvUiq  St 


Fig.  *  -  ParutMBiag  the  Velocity  Image  L,u>  Aaalytic 

Repose  Separated  by  Boumutm*  of  Analyticity. 
Analytic  Region*  are  Coaprined  of  Overlapping 
Neighborhood*  in  which  the  Flow  Field  »  Locally 
Second  Order. 

between  the  individual  approximations  and  the  data  from 
which  they  are  obtaioed.  When  neighboring  flow  approx¬ 
imations  are  deemed  'incompatible,"  it  is  assumed  that  a 
boundary  of  analyticity  has  been  crossed.  This  necessi¬ 
tates  the  splitting  and  merging  of  neighborhoods  in  order 
to  localize  this  discontinuity.  The  beginnings  of  a  control 
structure  governing  the  automatic  segmentation  of  Cow 
fields  is  presented  in  Se-tkm  4  below. 


?.4  Monocui-ir  AnaiysW  of  Bin  ocular  Flows 

In  the  esse  of  a  binocular  image  sequence,  the  mono¬ 
cular  now  analysis  described  above  is  to  be  applied  to  the 
left  and  right  image  sequences  separately.  But  rather 
than  going  so  far  as  the  3-D  inference  from  monocular 
flow  (Waxman  anti  Ullman  1983)  for  each  sequence,  we 
consider  only  the  recovery  and  segmentation  of  the 
separate  image  flows.  This  segmentation  into  analytic 
regions  (i.e.,  regions  of  slowly  varying  second-order  flow) 
allows  gross  correspondence  to  be  established  between 
these  regions  in  'he  left  and  rig ht  images.  It  also 
delineates  the  depth  and  orientation  discontinuities  which 
often  plague  stereo  matching  and  surface  rt  construction 
algorithms. 

This  competes  Step  1  of  our  stereo-motion  fusion 
module.  1  ,xc  rx.  .-onstructed  flow  fields  for  the  left  and 
right  images  ;  v  ;  ightjbgether  in  the  stage  of  “binccu- 
lar  flow  nnc.y.-^’  aescribed  next. 

3.  BINOCULAR  IMAGE  FLOWS 

For  simplicity,  we  restrict  our  analysis  to  the  parallel 
stereo  configuration  illustrated  in  Figure  3.  The  left  and 
right  image  planes  lie  in  a  common  plane  with  the 
fixation  point  located  at  Infinity  (i.e.,  the  "eyes"  point 
straight  ahead).  The  left  and  right  coordinates,  (i, ,  y, ) 
and  (xr,  yr )  respectively,  have  their  origins  at  the  centers 
of  their  respective  fields  of  view  separated  by  a  baseline 
of  magnitude  b  along  the  common  direction  of  the  x- 
ax.-f  Each  image  plane  is  positioned  at  a  focal  length  of 
unity  with  respect  ,o  a  pin-hole  located  at  the  vertex  bf 
projection  for  each  separate  cami  ra/eye.  This  stereo 


V 

V 


«r 


V 


,v 

i 


§ 

I 


K 


configuration  t»  assumed  to  move  rigidly  until  respect  to 

other  moving  objects  sn  the  scene.  No  allowance  ha3  teen 
made  for  vergence  of  the  eyes  (known  or  olherwise)  in  the 
current  formulation. 

Consider  the  monocular  flow  analysis  of  Step  l 
already  performed  separately  on  the  lefv  and  right  image 
sequences.  The  analytic  Bow  regions  bounded  by  Bow 
discontinuities  are  assumed  to  be  brought  into  correspon¬ 
dence  rather  easily.  This  can  be  accomplished  essentially 
by  matching  the  flow  discontinuities  between  left  and 
right  images.  The  correspondence  is  gross,  but  allows  the 
binocular  flow  analysis  to  focus  attention  on  individual 
regions.  Each  such  region  is  assumed  to  correspond  to  a 
smooth  surface  of  a  rigid  body.  Thus,  we  may  associate 
with  each  reg'on  a  set  of  relative  rigid  body  motion 
parameters.  However,  for  the  sake  of  analysis,  if  we 
ascribe  the  rigid  body  motion  to  the  “monocular 
observer",  as  in  Figure  1  and  equations  (1),  then  the  rigid 
body  motion  parameters  for  a  given  region  are  different 
for  the  left  and  right  cameras/eve;.  This  is  due  to  the 
fact  that  the  left  and  rijhl  cameras/eyes  are  in  motion 
with  respect  to  each  other  when  relative  motion  between 
object  and  observer  is  ascribed  to  the  observer.  If  accord¬ 
ing  to  the  left  coordinate  system  the  rigid  body  motion 
parameters  of  a  region  are  ( V, ,  O, ),  then  in  the  right 
coordinate  system  that  same  region  has  motion  parame¬ 
ters  (  V, ,  ft, ),  where 

0,  =■  0,  (5a) 

v,  -  v,  -  n,  x  hi,  (sb) 

and  !  is  a  unit  vector  in  the  common  j-direction. 

Thus,  the  image  flow  fields  o^  the  two  eyes/cameras 
differ  in  magnitude  as  well  as  distribution  (due  to  stereo 
disparit; ).  And  us  both  stereo  disparity  and  monocular 
flow  vary  inversely  with  depth,  we  should  not  be 
surprised  that  binocular  flow  and  disparity  are  related  in 
a  simple  way.  In  fact,  we  shall  see  that  binocular  flow  is 
synonymous  with  “rate-of-change  of  disparity." 


F 1  j  1  -  Spatiei  asd  boag*  Coordinate*  for  the  Sioorular 

Coofl{u/aboe  Space  Mexico  are  5 ben*  ter  Left,  Right 
and  Cyclop***  Systems  which  Mere  a  Rigid  Object. 

the  depth  function  for  the  corresponding  region  in  the 
right  image  may  be  expressed  aa 

Z,  (rr  •  Ir )  —  2,  (xt  -r6  iz,  ,ti  1,  1, ) 

-  Z,  (*i .  !/  )• 

Let  us  rewrite  the  monocular  image  velocity  relations 
(1)  in  terms  of  translation  and  rotation  coefficient 
rottrires, 

*(*.»)-  TT— T  T(*.f)-V+ *(*.*» -0;.  (8) 

these  2  X  -■*  matrices  being  functions  of  image  coordi¬ 
nates  alone  with  elementa  easily  obtained  from  relations 
(1).  Now  an  expression  like  (8)  may  be  f*asociated  with 
each  image  in  our  stereo  configuration;  the  coordinates, 
motion  parameters  and  depth  function  are,  however, 
different.  In  order  to  relate  tb*  left  and  right  image  flows 
for  a  given  region,  we  shall  express  both  Sows  in  terms  of 
the  left  coordinate  system  by  using  expressions  (5,8,7). 
Thus,  the  left  image  flow  is  given  by 

)?(*!.»<)  F,4 (9a) 


3.1  Relative  Flow  -  Disparity  Relation 

Given  the  parallel  stereo  configuration,  we  have  the 
simple  case  of  corresponding  features  lying  along  horizon¬ 
tal  epipolar  lines.  Thus,  a  feature  located  at  position 
(r,.  y( )  in  the  left  image  a>  some  instant  of  time  is 
located  at  (rf ,  yr  )  in  the  right  image,  where 

y,  =  y, .  (Aa) 

6{xt,yt)s  xr-  x,  ~  b/Z,  (r,  y, ),  (tib) 

6  (jj ,  yf)  being  the  angular  disparity  be’ween  right  and 
left  image  positions  of  the  feature  at  (r( ,  y,  )  in  the  left 
image.  Note  that  over  a  particular  analytic  flow  region, 
the  (horizontal)  disparity  forms  an  analytic  scalar  Held 
generated  by  the  smooth  depth  function  Zt  (xt  ,yt ).  And 
since  the  left  and  right  coordinate  systems  are  parallel, 


while  the  right  image  flow  is  given  by 

^(xj+i.y,)— j-d(x,,y,)T(x(+«,y,)  |  V,  -0,  Xii’J 
+  B  (x/  +S,yt )  n, 


(9b) 


Equations  (9a,b)  jield  the  image  velocities  of  correspond¬ 
ing  features  in  the  two  camersa/eyes. 

Now  we  define  the  "relative  flow"  (or  iinocufar 
difference  fiow  )  of  features  between  the  left  and  right 
images  as  the  difference  between  the  "shifted  flow  fields", 
the  "shift"  being  associated  with  the  disparity  field; 

^ v  (/, ,  y, ;  b)  os  v,  (j,  4-6  |,r, ,  y,  ],  y, )  -  t>,  (x, ,  y, ).  (TO) 


l 


k'-' 


r.'rf 

r\ 

is 

t  ■, 
r.\ 


a 


344 


Upon  expanding  the  coefficient  matrices  of  (8a,b)  accord¬ 
ing  to  equations  (1),  forming  the  relative  flow  MO)  ano 
simplifying  yields  the  following  expressions  for  the  com¬ 
ponents  of  relative  flow; 

(xi .»t ;  6)  —  jVzs2  +  ( y,n*  -  i,  ny)  s ,  (j i ») 

At',  (z,  .y, ;  4)  =  0  tllb) 

Forming  the  ratio  of  relative  flow  to  disparity  yields 
An,  (i,  ,y, ;  4)  i 

s77  S  -  T^*  +  (atlr  -»|0y).  (12a) 

°  \zt  ’Vi )  ® 

A  v.(x,  ,y, ;  4) 

—  77 - -r—  =  0  (12b) 


We  shall  interpret  expressions  (12a,b)  momentarily. 
But  first  note  that  this  ratio  of  relative  flow  to  disparity 
is  linear  in  the  variables  xt ,  y(  and  4,  with  coefficients 
proportional  to  the  unknown  parameters  of  relative 
motion.  The  reader  may  verify  for  bimaeif  that,  when 
reexpressed  in  the  cyrlopean  coordinate  system  (midway 
between  the  two  cameras/eyes),  expressions  (12)  remain 
unchanged.  Thus,  we  may  suppress  the  subscript  “I"  in 
(12)  and  write  instead, 

»(*.»)  +  (»!), -»nr),  (i3.| 

Aliii ii-o.  (.3b, 

4  (* .» ) 

with  image  coordinates  and  motion  parameters 
corresponding  to  the  cyclcpean  coordinate  system. 

If  we  consider  the  relative  flow  in  a  small  enough 
neighborhood  such  that  the  underlying  surface  patch  may 
be  treated  so  locally  planar,  then  we  have  a  simple 
expression  for  the  local  disparity  field, 

,  ,  ,  »  4 


4  (z-i )  = 


—  (1  -  pi  -  yy  )  ,  (14) 


where  Z0  is  the  depth  to  the  plane  measured  along  the 
center  of  the  cyclopean  field  of  view,  and  p  and  q  are 
the  components  of  local  slope.  Substituting  (14)  for  the 
disparity  on  the  right-hand  side  of  (13)  yielof  the  local 
relative  flow  to  disparity  relations, 

Ai-.fi-.y)  Vz  (  V>  _  „  | 

-H7o~7T-i57  ”1’ 


At',  (r.y  ) 
4 (x  .y ) 


linear  junction  of  image  eoc. dinates  with  coefficients 
depending  on  the  surface  structure  and  relitiv*  motion 
between  object  and  observer. 


3.2  Interpreting  the  Correlation 

The  correlation  between  relative  flow  As  and 
disparity  4,  presented  in  cyclofean  coordinates  in  (13a,b) 
is  simple  to  interpret.  Recall  that  we  are  considering 
only  a  paralle'  stereo  imaging  geometry,  hence,  the  epipo- 
lar  lines  are  horirontal  (i.e.,  parallel  to  the  i -axe.).  Now 
the  relative  Sow  Aw  represents  the  rate  of  separation  of 
a  feature  in  one  image,  from  its  match  in  the  other 
image.  It  is  the  rare  of  change  of  vector  disparity.  As  a 
feature  and  its  match  must  always  fie  along  some  epipo- 
lar  tine,  its  vertical  disparity  must  i-emain  zero  in  this 
case.  Thus,  relatioa  (13b)  expresses  the  fact  that  a 
feature  and  its  match  must  flow  perpendicular  to  epipo- 
lars  at  the  same  rate  is  order  to  fie  on  a  common  epipo- 
lar.  In  general,  ; he  i  ate  of  change  of  vertical  disparity 
must  be  ruch  as  to  keep  a  feature  and.  its  match  on  an 
epipolar  line. 

For  our  parallel  stereo  configuration,  we  may  then 
identify  Ac,  with  the  rate  of  change  of  (horizontal) 
disparity  and  denote  it  by  4.  Returning  to  expression 
(H)  we  have 

•  <■»> 

F  rom  l/*w  -  iK+ft  X  H)  we  have 

Z  - -Vj-  Ux  Y+nyX,  h-nce, 

7 

± - ±-(,Ox-ztlY) 


We  see  that  locally,  the  relative  flow  to  disparity  ratio  u  a 


—  4-(,nr-*ny). 


Combining  (17)  wilt  equation  (16)  yields  for  4/4, 

|l£JLl  -  1L  6(x,t)  +  (,nx-*nr).  (is) 

which  is  identical  with  relation  (13a).  Thai,  this  correla¬ 
tion  between  relative  image  flow*  and  stereo  disparity  is, 
in  fact,  a  relationship  between  disparity  and  its  rate  oj 
change 

3.3  Using  the  Correlation  to  Establish 
S’.ereo  Correspondence 

If  the  relative  motion  between  the  cam< -as  and  the 
objects  in  the  field  of  view  is  known,  the  biieeular  flow 
relations  can  be  used  to  establish  stereo  correi  pondence 
directly.  To  do  this,  the  correspondence  of  potential 
matches  Is  tested  by  substituting  the  measured  velocities 


O 


WjU 


and  positions  of  v.  pair  of  points  into  Equations  (13). 
This  technique  is  described  it  mere  detail  in  Waxman 
and  Duncan  (1885).  If  the  relative  motion  parameters  are 
unknown,  it  is  possible  to  establish  correspondence  using 
a  local  support  technique  with  the  linearized  version  of 
the  binocular  flow  relations  (15).  This  later  technique  is 
similar  to  that  suggested  by  Prazdny  ( 1 084)  for  use  in 
stereo  matching  with  static  images. 


Uck|mM  Surftci 


For  •grants  krftcM 


Structural  M«« 
Occluding  loaadary 


4.  EXPERIMENTS 

A  iimitei  experimental  program  was  undertaken  to 
demonstrate  the  feasibility  of  implementing  the  first  three 
steps  of  the  stereo-motion  module:  Step  1  (flow  recovery 
and  segmentation).  Step  t  (establishing  corresrondence 
using  the  binocular  difference  flow)  and,  to  a  limited 
extent.  Step  S  (recovering  surface  structure).  A  brief 
description  of  th..  experiments  is  given  below.  The 
interested  reader  is  referred  to  Wixman  and  Duncan 
(1985)  if  more  detail  is  desired.  Binocular  image  flow 
fields  were  obtained  using  a  camera  mounted  on  a  robot 
arm.  viewing  scenes  consisting  of  white  objects  covered 
by  black  dots.  In  general,  the  experiments  were  successful 
insofar  as  they  confirmed  the  potential  of  oveilap  compa¬ 
tibility  for  segmentation  of  laboratory  flow  data,  and 
verified  the  binocular  difference  flow-disparity  relations 
for  a  particular  configuration.  Still,  much  work  remains 
before  a  fully  automatic  module  is  realized. 


•M 


Coin  VlcvlAf  Dlractloa 


x — *  x  1 — mtiM 
Tim  rimm 


lackgrwai 


For«|r<w4  krfKM 

structural  Mf* 

4.1  Apparatus  and  Procedure* 

The  moving  pair  of  stereo  cameras  was  simulated 
using  a  single,  black  and  white,  Sony  {model  DC-37) 
C'CD-camera  mounted  on  an  American  Robot.  MERLIN 
rob  >t  arm.  The  images  were  digitized  into  480  X  420 
pixel  arrays  using  a  Oirinnell  (GMR-27)  display  processor 
and  memory.  Throughout  this  section,  all  angular  meas¬ 
urements  are  given  in  units  of  pixels:  time  is  in  units  of 
seconds.  Each  image  flow  field  was  obtained  from  three 
frames  taken  with  the  camera  at  three  positions,  equally 
spaced  in  time,  on  its  trajectory.  The  trajectories  and 
viewing  directions  were  chosen  to  simulate  a  pair  of  cam¬ 
eras  in  a  parallel  stereo  configuration  (cf.  Fig.  3).  The 
baseline  between  cameras  was  3.0  inches. 

The  scenes  consisted  of  white  surfaces  covered  with  a 
distribution  of  0.125  inch  diameter  biack  dots.  From  the 
typical  viewing  distance  of  sC  inches  the  dots  appeared  in 
the  image  with  a  diameter  of  3  pixels.  The  centroids  of 
the  dots  were  tracked  for  three  irarnes  and  velocities  at 
the  centroids  in  the  cent  al  frame  in  time  were  computed. 

4.2  Image  Flow  Segmentation 

We  have  analyzed  the  scene  shown  in  Figure  I, 
which  consists  of  a  planar  background  with  two  con¬ 
nected  planar  surfaces  in  the  foreground.  The  effective 
camera  motions,  also  shown  in  the  figure,  were  0.25 
inches/.-ec  in  the  viewing  direction  (toward  the  scene)  and 


rre«  llw 

Fi*  4  -  Two  View*  of  lb*  Strut  Llrrd  for  ttw  SegnMnUlioo 
fixptnffleow. 


0.25  inches/seeond  in  the  .Y-direction  (parallel  to  the 
scene).  At  the  central  frame  the  cameras  were  about  40 
inches  from  the  foreground  surfaces. 

The  current  segmentation  program  reveals  the 
potential  locations  of  flow  discontinuities,  but  does  not 
refine  them  nor  link  them  into  global  boundaries  of 
anJyticity.  The  program  first  divides  the  image  into  /VJ 
equal-sized  rectangles;  in  this  case,  a  5  X  5  rectangular 
grid  on  each  480  X  420  pixel  image.  Each  rectangle  con¬ 
tained  an  average  of  about  10  feature  points.  The  velo¬ 
city  data  in  each  rectangle  was  then  fit  to  a  pair  of 
second-order  polynomials  (cf.  equations  3)  using  a  linear 
least  squares  approach.  The  ern.  •  per  point  between  the 
data  and  the  second  order  fit,  defined  as 


err  =  (,V 


! 


?olf 


v» !  foif  vi !  *(u 


(24) 


was  typically  0.02  . 

In  an  attempt  to  see  if  the  polynomial  flow  fields 
front  adjacent  rectangles  were  compatible,  i.e.,  belonged 
to  the  same  analytic  (low  region,  the  velocities  were  com¬ 
pared  in  overlapping  neighbo,  hoods.  Specifically,  at  vert¬ 
ical  boundaries  between  left  and  right  rectangles  and  at 


346 


V*.  \ 


horizontal  boundaries  between  upper  and  lower  rectan¬ 
gles,  an  overlap  compatibility  measure  (O,  and  C\ , 
respectively)  was  computed, 


C ,  =  T - — - r  (t>r-t>,)2djdy 

(err,  -f  err^  )  1^  Ar 


1 '2 


(25a) 


Ck  = 


2.0 


(err,  -t-err,  ' 


Ia.('V<~v>  '?d£dy 


1 ,2 


(26b) 


where  A,  and  .4*  are  areas  around  the  vertical  and  hor¬ 
izontal  boundaries  respectively.  The  velocities  or  and  v, 
refer  to  the  velocity  functionals  from  the  right  and  left 
sides  of  vertical  boundaries,  while  t>,  and  refer  to  the 
velocity  functionals  from  the  top  and  bottom  sides  of  hor¬ 
izontal  boundaries.  After  computing  the  compatibility  for 
the  original  5X5  recta-gular  grid,  the  calculations  were 
repeated  twice  with  the  grid  shifted  to  the  right  in  each 
case  by  one-third  the  rectangle  width  (appioximately  the 
distance  between  feature  points).  The  three  horizontal 
grid  positions  were  then  repeated  with  the  grid  shifted 
down  by  one-half  the  re^’angle  height.  Thus,  the  overlap 
error  was  computed  fc  boundaries  of  6  rectangular 
grids  with  25  rectangles  in  each  grid.  A  plot  of  the  over¬ 
lap  compatibility  function  is  shown  in  Figure  5  for  the 
vertical  boundaries  of  the  left  image.  A  similar  plot  for 
the  horizontal  boundaries  appears  in  figure  8.  Consider 
the  compatibility  across  vertical  boundaries  first.  Figure 
5.  Note  that  the  contours  with  C,  =  4  f,.e.,  four  times 
the  error  in  fitting  the  polynomials)  do  not  correspond  to 
any  structural  feature  of  the  scene.  Thus,  the  noise  level 
appears  to  be  about  4.  In  Figure  5,  both  the  vertical 
occluding  boundary  and  the  vertical  structural  edge 
appear  in  the  contours  with  compa’.ability  errors  as  high 
as  10,  i.e.,  2.5  times  the  noise  level.  For  the  structured 
edge  (i  e.,  the  slope  discontinuity)  the  largest  values 
appear  slightly  to  the  right  of  the  feature.  Note  that 
these  contours  also  indicate,  to  some  extent,  the  position 
of  the  horizontal  occluding  boundary.  This  horizontal 
boundary  is  seen  more  clearly  in  the  compatibility  of 
upper-lower  pairs  of  rectangles.  Figure  6.  The  compati¬ 
bility  function  "is  again  typically  8  to  10  at  the  boundary. 

The  How  field  segmentation  results  indicate  that  the 
overlap  compatibility  method  can  sucessfully  locate 
occluding  boundaries  (i.e.,  depth  discontinuities)  and  to 
some  extent  structural  edges  (i.e.,  slope  discontinuities)  in 
real  data.  However,  the  noise  level  and  resolution  of  the 
results  need  to  be  'mproved.  It  is  believed  that  both  «f 
these  problems  can  be  remedied  by  increasing  the  density 
of  data  points  in  the  images,  Waxman  and  Duncan 
(1083). 


4.3  Binocular  Flow  Field  Experiments 

In  this  section  we  describe  a  preliminary  experimen¬ 
tal  exploration  of  the  binocular  flow  equations  (11).  In 
particular,  a  motion  was  chosen  for  the  camera  pair 
and  the  equations  were  verified.  It  was  pointed  out  in 


Fig.  &  •  Overlap  Compatibility  Contour*  Across  Vertical 
Boundaries.  C,  -  Left  Image. 


Fig.  8  -  Overlap  Compatibility  Contour*  Aero**  Iloriaontal 
Boundaries.  C\  -  Left  Image. 

Waxman  and  Duncan  (1885)  that  the  Vz  motion  is  one 
of  the  two  single  component  motions  that  will  allow  accu¬ 
rate  discrimination  between  correct!;,'  and  incorrectly 
matched  features.  The  experiment  used  the  camera  set-up 
described  earlier  to  simulate  a  pair  of  cameras  separated 
by  a  3  inch  baseline.  The  cameras  viewed  a  planar  sur¬ 
face  perpendicular  to  the  viewing  direction  (i.e.,  a  frontal 
plane).  The  velocity  fields  were  obtained  from  three 
images  with  the  cameras  at  43.5,  45.0  and  46.5  inches 
from  the  surface. 

The  binocular  flow  equations  (II)  were  verified  by 
t  vo  techniques:  one  using  the  individual  data  points  and 
the  other  using  the  polynomial  fits  to  the  velocity  fields; 


347 


the  space  motion  being  known  in  both  cases  here  (which 
'.s  not  generally  true).  In  genera!  both  ant t hods  proved 
successful  in  this  preliminary  experimental  program.  The 
details  are  given  in  Waxman  and  Duncan  (1985). 


6.  CONCLUSIONS 

In  this  paper  we  have  outlined  a  set  of  five  steps 
toward  the  development  of  a  stereo-motion  fusion 
module.  The  successful  development  of  a  complete 
module  of  this  type  has  enormous  potential  for  robotics 
in  a  dynamic  environment.  It  may  also  shed  some  light 
on  the  nature  of  the  processing  going  on  in  the  human 
visual  pathway.  In  this  respect,  the  work  of  Regan  and 
Beverley  (1979)  is  most  relevant,  for  their  own  psycho¬ 
physical  and  neurophysiological  studies  have  led  them  to 
suggest  the  existence  of  neural  organizations  which  may 
“compute”  the  binocular  difference  flow  (or  relative  flew 
between  the  eyes)  which  is  so  basic  to  our  own  theory. 

The  basic  advantages  this  modu'e  offers  over  static 
stereo  are:  monocular  detection  of  the  depth  and  orienta¬ 
tion  discontinuities  (before  matching  is  attempted),  use  c." 
a  correlation  between  binocular  difference  flow  and 
disparity  to  drive  the  matching  process  (either  indepen¬ 
dent  of,  or  in  conjunction  with  matching  based  on  dispar¬ 
ity  alone),  the  ability  to  refine  disparity  estimates  to 
sub-pixel  accuracy  by  considering  the  smooth  orbits  of 
features  through  the  left  and  rght  image  space-tines, 
and  the  potential  to  focus  attention  of  the  matching  pro¬ 
cess  to  the  areas  where  new  features  enter  the  field  of 
view.  The  advantages  of  this  module  over  structure  from 
monocular  motion  are:  the  ability  to  ic.'-'vct  absolute 
structure  and  rigid  body  motion?  (without  scale  factor 
ambiguities),  and  that  only  linear  equations  need  be 
solved  to  recover  rigid  body  motion  parameters. 

Still,  much  work  remains  to  be  done  before  a  com¬ 
plete  module  or  this  type  can  be  constructed.  The  con¬ 
trol  structure  for  the  flow  segmentation  procedure 
requires  further  development.  This  segmentation  pro¬ 
cedure  should  be  iterative,  with  subsequent  refinements 
occurring  near  detected  flow  discontinuities.  The  discon¬ 
tinuities  in  left  and  right  images  must  also  be  TiatchedPin 
order  to  establish  gross  correspondence  among  analytic 
regions.  The  binocular  difference  flow-disparity  relation, 
derived  in  Section  3,  requires  further  testing  in  >rder  to 
insure  its  validity  under  more  general  classes  of  motion 
than  tried  here.  It  should  also  be  general' zed  to  incor¬ 
porate  vergence  effects.  The  matching  techniques 
described  in  Section  5  neea  to  be  implemented  and  tested 
in  a  variety  of  cases.  The  ability  to  combine  eminence  in 
establishing  correspondence  is  an  appealing  aspect  of  the 
approach  and  needs  to  be  implemented  as  well. 

The  possible  role  of  a  combined  stereo- motion 
module,  such  as  this  one,  in  the  human  visual  processing 
'ask  raises  some  interesting  questions.  How  oes  the 
brain  utilize  disparity  estimates  and  binocular  flow- 
disparity  cues  in  establishing  correspondence?  Does  one 


take  priori.-,  over  the  other,  or  are  they  combined?  What 
happens  when  structure  from  binocular  flow  conflicts  with 
structure  from  static  stereo  (Mayhew  and  Frisby,  private 
communication)?  Does  "ne  percept  dominate  or  do  we 
see  illusions?  Are  there  certaip  hinds  of  “head  mol'ons" 
preferred  for  disambiguating  false  matches?  Is  there  a 
“gradient  limb’-'  effect  associated  with  the  coefficients  of 
the  line.';'  terms  ;u  equation  (15a)?  Is  it  possible  to  fuse  a 
dynamic  stereogram  which  is  beyond  the  static  disparity 
gradient  limit  of  unity?  Perhaps  psychophysical  experi¬ 
ments  can  resolve  some  of  these  questions. 


REFERENCES 


Adiv,  G.  1984  ^October).  Determining  3-D  motion  and  structure 
from  optical  flow  generated  by  s>  /era!  n,o»ipg  objects.  Proc. 
PAP.PA  Imape  Understanding  Wartckop,  New  Orleans;  SAJC,  pp. 
113-129. 

Burt,  P.  and  Julest,  B  1980.  A  disparity  gradient  limit  for  binocu¬ 
lar  fusion,  9eienee,  298: 613-817. 

Buxton.  BF.,  Buxton,  H,,  Murray,  D.W.  and  Williams,  N.S.  1984. 
3-D  solulieps  to  the  aperture  problem.  European  Cos/.  Artificial 
Intelligence  ’8f. 

Eastman,  Ft.  and  Waxman,  A.M.  1983.  Using  disparity  functionals 
for  stereo  correcpondence  and  surface  reconstruction.  T-ch.  Report 
in  preparation.  Cotieg;  Park,  MD:  University  of  Maryland,  Center  for 
Automation  Research. 

O 

Crimson,  W.E.L.  ,981.  From  /rapes  to  Sarfatta.  Cambridge: 
M  I  T.  Press. 

Jenkin,  M.R  M.  1984  (September).  The  stereopsis  of  lime-varying 
images. 

Tech.  Report  RBCV-TR-84-3.  Toronto,  Canada:  University  of 
Toronto,  Dept,  of  Computer  Science 

Koenderink,  J.J.  and  van  Doom,  AJ.  1975.  Invariant  piopertiea  of 
the  motion  parallax  field  due  to  the  movement  of  rigid  bodies  relative 
to  an  obrrver.  Optica  At ‘a  22.773-791. 

Koenderink,  J.J.  and  van  Doom,  A.J.  1978  Geometry  of  binocular 
vision  and  a  model  for  stereopsis.  Biot.  Cybernetics  21:  29-35. 

Longuet-lliggins,  H.C.  1981.  A  computer  algorithm  for  reconstruct¬ 
ing  a  scene  from  two  projections.  Nature  293:  133-135. 

Longuet-lliggins,  H.C  ,  and  K.  Prsidny,  K.  1980  The  interpreta¬ 
tion  ot  a  moving  retinal  imag*  Pro e.  Pop  Soe.  Load  B208.  385- 
397 

Marr,  D,  1982.  Vision.  San  Francisco:  Freeman. 

Marr,  D.  and  Poggir,,  T.  1979  A  computational  tueory  of  human 
stereo  vision.  Proc  Hop  Soc.  Load.,  B204:  101  328 

Mayhew,  J  E  W  and  Frisby,  J.P.  1981  Psychophysical  and  compu¬ 
tational  studies  towards  a  theory  of  human  stereopsis.  Artificial 
lnteitipr.net  17;  349-385. 

Poggio,  C  F  and  Poggio,  T  1984  The  analysis  of  stereopsis.  Ann 
Rev.  Nenroaei.,  7:  373-412. 


333 


Pollard,  SB.,  May  hew,  J.E.W.  and  Frisby,  J.P.  1985.  Disparity  gra¬ 
dients  and  stereo  correspondences.  Preprint,  Dept.  Psychology, 
Sheffield  University. 

Prazdny,  K.  1980.  Egomotion  a*iJ  rriative  depth  map  from  optical 
flow. 

Biol.  Cyler.  36:  87-102. 

Prazdny,  K.  1984.  Detection  of  binocular  disparities.  Preprint, 
Fairchild  Laboratory  for  Artificial  Intelligence  Pesearct.  Biol. 
Cyber,  (in  press)  1985. 

Regan,  D.  and  Beverley,  K.I.  1979.  Binocular  and  monocular  stimuli 
for  motion  in  depth:  Changing-disparity  and  changing-six*  feed  the 
same  motion-in-depth  stage.  Vision  Research  19:1331-1342. 

Richards,  W.  1983.  Structure  from  stereo  and  motion.  A. I  Memo 
731.  Cambridge,  MA:  Massachusetts  Institute  of  Technokgy, 
Artificial  Intelligence  Laboratory.  See  also,  J.  Opt.  i>oe.  Amer.  A? 
313-549  (1985). 

Subbarao,  M.  and  Waxman,  A  M.  !9S3.  On  the  uniqueness  of  image 
flow  solutions  for  planar  surfaces  in  motion.  Tech.  Report  113.  Col¬ 
lege  Park,  MD:  University  of  Maryland,  Center  for  Automaton 
Research. 

Tsai,  R.Y.  and  Huang,  T.S.  1981a.  Uniqueness  and  estimation  of  2- 
D  motion  parameters  of  rigid  objects  with  curved  surfaces.  Report 
R-921.  University  of  Illinois/Urbina-Champaign  Coordinated  Science 
Lab. 

Tsai,  R.Y.  and  Huang,  T.S.  1981b.  Estimating  3-D  motion  parame¬ 
ters  of  a  rigid  planar  patch.  Report  R-922.  University  of 
Ulinois/UrbanarChampaign  Coordinated  Science  Lab. 

Ullman,  S.  i979.  The  Interpretation  of  Visual  Motion.  Cambridge: 
MIT  Press. 

Ullman,  S.  1983.  Maximizing  rigidity:  the  incremental  recovery  of 
structure  from  rigid  and  rubbery  motion.  Memo  721.  Cambridge, 
MA:  Massachusetts  Institute  of  Technology  Artificial  Intelligence 
Laboratory 

Waxman,  A  M.  1984  (April).  An  image  flow  paradigm.  Proc  tnd 
IEEE  Workshop  on  Computer  VYaiott:  Representation  and  Control, 
Annapolis:  IF  EE,  pp.  49-57. 

Waxman,  A.M.  and  Duncan,  J.H.  1985  (May).  Binocular  Image 
Flows:  Steps  Toward  Stereo  -  Motion  Fusion.  Tech.  Report  119 
Col  leg*  Park,  MD:  University  of  Maryland,  Center  for  Automation 
Research. 

Wrjcman,  AM.  and  Sinha,  S.  1984  (October).  Dynamic  Ster*o: 
Passive  ranging  t>  moving  objects  from  relative  image  flows.  Proe. 
DARPA  Image  V  \derstinding  Workshop  New  Orleans:  SA1C,  pp. 
130-136. 


Waxman,  A.M.  and  Wohn,  K.  1985.  Contour  evolution,  neighbor¬ 
hood  deformation  and  image  flow:  Textured  surfaces  in  motion. 
Image  Understanding  19° 5,  (eds.)  W.  Richards  and  S.  Ullman.  Nor¬ 
wood;  Able*  Publishing. 

Wohn,  K.  1984.  A  contrui^based  approach  to  image  flew.  P&.D. 
Thesis,  University  of  Maryiand,  Department  of  Computer  Science. 

Wohn,  K.  and  Waxman,  AM.  1985a.  Contour  evolution,  neighbor¬ 
hood  deformation  and  local  image  flow:  Curved  surfaces  in  motion. 
Tech.  Report  in  preparation.  College  Park,  MD:  University  of  Mary¬ 
land,  Center  for  Automation  Research. 

Wohn,  K.  and  Waxman,  A.M.  1985b.  The  analytic  structure  of 
image  flows:  Deformation  anti  segmentation.  Tech.  Report  in 
preparation.  College  Park,  MP  University  of  Maryland,  Center  for 
Automation  Research. 


Waxman,  A  M.  and  Ullman.  S.  1983  (October).  Surface  structure 
and  3-D  motion  from  imaire  flow:  A  kinematic  analysis.  Tech. 
Report  24.  College  Park,  VD:  University  of  Maryland,  Center  for 
Automation  Research.  Vlso  .ee  Int.  J.  Robottcs  Research  4  (3), 
1985. 


Waxman  A.M.  and  Woh.i.  K.  1984  (April).  Contour  evolution, 
neighbor nood  deformation  *nd  g.obal  image  flow;  Planar  surfaces  in 
motion  Tech.  Report  58.  College  Park,  MD:  University  of  Mary¬ 
land.  Center  for  Automation  Research.  Also  see  Int.  J.  Robottcs 
Research  4  (3),  1985 


jSC93CilU£^^L 


Detecting  S tractor*  in  Random- D«t  Pattern* 


Richard  Vistnes 


Computer  Science  Department 
Stanford  University,  Stanford,  California  94305 


Abstract 

This  piper  present*  the  results  of  *o«ne  psychophysical  ex¬ 
periments  whose  purpose  wa «  to  determine  the  parameter* 
that  affect  the  detectability  of  straight  dotted  lines  and 
curved  dotted  lines,  embedded  in  a  surround  of  random 
dots.  Results  show  that  performance  does  not  depend  on 
line  length  or  regularity  of  dot  spacing,  but  rather  on  the 
ratio  of  dot  spacing  relative  to  surround,  on  target  curva¬ 
ture  and  on  ■ ,aggedness *  of  the  target.  Speculation,  are 
made  about  a  mechanism  that  may  be  able  to  duplicate 
this  performance,  w  well  as  perform  two  other  important 
tasks  in  early  vision. 


Introduction 


An  important  task  in  early  vuiou  is  the  detection  of  pat¬ 
tern.  in  image..  A.  Wilkin  and  Tenenl.aum  (19M)  pointed 
out  in  a  recent  paper,  human  being,  can  u.ually  find  the 
structure,  in  image,  at  a  very  early  »tage,  before  any  se¬ 
mantic  meaning  ban  been  attar-  I  to  thoce  structures. 
These  structure,  are  usually  dented  by  what  us  known  m 
the  psychological  literature  a.  ‘pre-attentive  vision  (see 
Neisser  (1967),  and  Triesman  ( 1S8S)  for  some  recent  work). 
Pre- attentive  human  vision  i*  able  to  perform  perceptual 
tasks,  probably  by  u.ing  parallel  machinery,  in  a  very  short 
time  (under  200  milliseconds).  Pre-attentive  discrimina¬ 
tions  are  made  automatically,  without  focused  attention. 
Pre-attentive  vision  can  detect  lines  and  curves  in  images; 
find  instances  of  para  'elism  and  symmetry;  and  segment 
images  pc  the  basis  o  a  variety  of  ‘texture*  difference, 
such  as  intensi.y,  si.e  of  elements,  color  of  elements,  and 
orientation  of  elements.  For  an  example  of  what  can  be  ac¬ 
complished  by  pre-attenti  e  vision  and  what  cannot,  con¬ 
sider  Figure  3.  The  linear  itructure  in  (a)  is  immediately 
apparent,  while  the  structure  in  (b)  is  not  perceivable 
uiithruit  flcrutinv  —  that  is,  without  attention. 


Prc*attentive  vision  is  closely  related  to  the  phenom- 
enon  of  perceptual  organisation  in  vision.  Perceptual  or¬ 
ganisation  can  be  defined  loosely  as  ‘.he  process  of  form¬ 
ing  descriptions  of  the  significant  structures  in  the  image. 
Some  important  kinds  of  structure  include  symmetry,  par¬ 
allelism,  linear  structures,  curved  structures,  etc.  Human 
perceptual  organisation  is  usually  preattentive,  i.e.,  it  oc¬ 
curs  rapidly  and  in  parallel  across  tte  visual  field. 


While  studying  the  phenomenon  of  perceptual  orgam- 
sation,  1  became  interested  in  the  question  of  how  well i  hu¬ 
man  being,  can  perceive  curvilinear  structures  when  those 
structures  are  embedded  in  a  noi^f  background.  This  ques¬ 
tion  has  piactical  import  in  computer  vision  as  well  a*  in 
human  perception.  In  current  theories  of  image  under¬ 
standing,  one  of  the  first  saepe  in  image  processing  is  edgt 
finding.  Local  edge  operators  are  applied  to  the  image; 
the  result  of  this  process  is  an  array  of  ‘edgeU  that  indi¬ 
cate  the  presence  of  an  edge  at  that  point.  These  edgeli 
then  must  be  linked  into  line,  and  curves.  This  probleo 
is  difficult  because  there  is  typically  noise  surrounding  the 
•correct*  edge!..  Lowe  (ISM,  19fci)  studied  the  problen 
of  linking  points  into  curvilinear  structures.  He  used 1  ,U 
tistical  approach  similar  to  tiui  proposed  by  Wilkin  an< 
/r- _ i - nna?l  f nr  findins  unlikely  patterns  in  unaget 


Haas  and  Witkiu  (1985)  have  d.-ne  some  work  relate 
to  this  problem.  Their  system  analysed  oriented  pattern 
,uch  as  the  well-known  Glass  patterns,  and  found  vec 
tor  fields  corresponding  to  the  doc  nant  flow  in  the  pal 
tern.  Zucker  (1982,  1983)  used  lateral  inhibition  amon. 
orientation-specific  operator,  to  estimate  local  orieniatioi 
in  dot  pattern*. 

In  order  to  better  understand  how  people  perform  ii 
this  task,  1  performed  a  number  of  fairly  careful  (if  te 
diou.)  psychophysical  experiment*  in  order  to  find  the  pa 
rameters  that  affect  the  detectability  of  certain  pattern 
in  images.  This  paper  reports  some  preliminary  result 
of  those  experiments.  The  pattern,  in  which  1  was  Intel 
ested  were  dotted  straight  lines  and  dotted  circular  arei 
embedded  in  a  surround  of  random  dots.  It  is  useful  t 
study  dot  patterns,  smee  it  removes  any  shape  or  intensit 
differences  between  the  elements  of  the  pattern  and  fore* 
the  visual  system  to  use  only  textural  segregation  raecha 
nisms.  The  experiments  varied  several  parameters  of  th 
target  pattern,  such  as  the  length,  or  spacing  of  dots,  an 
measured  subjects’  accuracy  in  detecting  the  presence  < 
the  target.  There  was  typically  a  threshold  for  detectioi 
below  which  the  target  could  not  be  easily  seen,  and  abos 
which  it  could  be  detected  accurately.  In  this  study,  I  w; 
not  interested  in  finding  the  precise  threshold  values  fc 
the  parameters  that  determine  our  ability  to  detect  thet 
patterns.  Rather,  1  was  interested  in  determining  whi« 
parameters  affect  our  performance,  and  the  general  ran| 
of  values  that  determine  the  thresbrld. 


la  this  piper,  1  first  report  the  experimental  setup 
aad  folio*  with  tbe  experiment*  and  their  results.  1  then 
miJte  some  speculation*  about  bow  tbe  problem  of  find¬ 
ing  »uci  pattern!  in  noisy  background*  could  be  solved, 
both  by  biological  and  machine  vision  systems  I  present 
some  preliminary  result!  of  an  algorithm  to  find  curvilinear 
patterns  such  as  those  considered  here,  and  suggest  that 
the  same  mechanism  may  be  useful  for  solving  two  related 
problems  in  perceptual  organisation. 

Experimental  Methods 

In  all  these  experiments,  tbe  images  were  displayed  to  sub¬ 
jects  for  a  short  time  (about  200  msec).  As  I  noted  above, 
I  am  interested  in  pre-attentive  detection  of  structure,  not 
the  structure  that  can  be  seen  only  upon  prolonged  in¬ 
spection.  The  200  msec  time  limit  prevents  prolonged  in¬ 
spection  of  the  pattern,  and  also  prevents  tbe  subject  fron 
moving  the  eye*  to  use  fovea]  high  resolution  to  inspect  a 
part  cf  the  image 

It  this  paper,  I  present  the  result*  of  experiment*  that 
test  detection  of  two  kinds  of  target:  straight  dotted  liw 
and  cur/eci  dotted  lines,  embedded  in  a  background  of  r*M- 
dom  dots.  The  target  appeared  in  the  ima^e  in  one  of  four 
positions,  chosen  at  random:  verical  on  tne  right  or  left, 
or  borisontal  on  tbe  top  or  bottom.  These  positions  are 
shown  in  Figure  1.  The  image  was  presented  for  a  short 
time,  after  wh:cb  tbe  subject  indicated  (by  a  keystroke) 
where  he  or  she  thought  the  target  bad  been.  Tbs  com¬ 
puter  accumulated  the  accuracy  of  detection  over  a  number 
of  trials. 

We  now  look  at  the  experimental  parameters  that  con¬ 
trol  tbe  appearance  of  the  target  and  tbe  su-round  in  these 
displays.  1  he  appearance  of  tbe  random  surround  can 
briefly  be  described  (but  see  tbe  details  below)  by  one  pa¬ 
rameter,  the  average  spacing  between  dot*  d,  (Distance  in 
Surround). 

Consider  tbe  parameters  that  a.Tect  tbe  shape  of  the 
dotted  target.  First,  a  straight  line  of  evenly-snaced  dots 
has  two  pa:ameters:  the  length  L  of  the  line  and  the  spac¬ 
ing  d,  of  the  dots.  Tbe  number  of  dots  in  such  a  line 
is  \L/di\.  W*  can  allow  the  dots  to  be  unevenly  spaced 
along  the  line,  but  still  have  the  same  average  spacing  d, 
by  specifying  a  longitudinal  variation  rip.  This  parameter 
specifies  the  maximum  variation  in  a  direction  along  the 
line.  Each  dot  in  the  line  will  be  perturbed  by  an  amount 
0.5 RdfVt.  in  either  direction  along  the  line  away  from  its 
nominal  (evenly-spaced)  position,  where  It  is  a  random 
number  between  -1  and  1.  When  tiy,  is  »ero,  the  dots  are 
evenly-spaced. 

We  can  allow  the  dots  to  be  displaced  a  ay  from  a 
straight  line  (to  form  a  jagged  linel  by  specifying  a  trans¬ 
verse  variation  tiy.  This  specifies  a  maximum  variation  in 
a  direction  normal  to  the  line.  Each  dot  is  perturbed  by  an 
amount  0.SRd,vr  in  a  direction  normal  to  the  'ine,  again 
away  from  its  nominal  position,  R  is  again  a  landoai  num¬ 
ber  between  -1  and  I.  When  ey  is  sero,  the  dots  fall  in  a 
straight  line. 


By  specifying  '.arious  values  tor  these  four  parameters, 
many  sorts  of  dotted  lines  can  be  generated.  They  all  have 
tbe  property  that  the  average-  number  of  dots  in  the  line 
per  unit  length  (or  the  average  density  of  dots)  is  uniform. 

Now  consider  a  dotted  circular  arc,  such  as  tbe  one 
shown  in  Figure  2.  The  shape  of  an  arc  of  evenly-spaced 
dots  can  be  specified  by  the  length  L  of  the  chord  joining 
the  end*  of  the  are,  the  height  II  of  the  arc  above  this 
chord,  the  average  spacing  d,  of  dots  in  the  arc,  and  the 
transverse  and  longitudinal  maximum  variations  Ur  and 
vL.  To  be  precise,  the  dot  spacing  f,  is  the  linear  dis¬ 
tance  between  dots,  rather  than  the  distance  along  the  arc 
joining  them;  that  is,  it  ir  the  distance  along  the  chord 
joining  the  dot*.  The  transverse  variation  is  in  the  direc- 


tion  along  the  line  through  the  do'  and  the  center  of  the 
circle  of  which  the  arc  i*  a  part,  i  t  ,  along  the  d'aascter  of 
the  circle.  The  longitudinal  variation  is  along  a  line  normal 
to  this  diameter,  that  is,  along  the  tangent  to  the  cirri:. 
No.’t  that  when  H  is  sero,  an  arc  specifier!  in  this  way  is 
identical  to  a  straight  line  as  specified  above. 

Details 

In  this  section  I  will  present  to  roe  of  the  details  of  the 
experimental  setup  and  of  the  image-generating  process. 
The  casual  reader  need  not  read  this,  and  may  wish  to 
•kip  to  the  next  section - 

Etftnmtntrt  trtrp.  The  imee,**  were  generated  by,  km  die- 
pii/id  cw  the  kim  of,  s  Symbolic#  9600  Lisp  Machine.  Is 
these  experiments,  tbs  width  of  the  image  wse  I  inches  (460 
pixels);  this  is  equivalent  to  about  91  pixels  per  inch-  The  sub¬ 
ject's  eyes  were  about  17  tochee  away  from  the  image,  so  the 
image  subtended  11.7  degrees;  the  length  of  the  line  in  most 
esses  (WC  pixels)  was  about  t  inches  or  6.7*.  The  distance 
f.  ora  the  center  of  the  display  (and  the  (Uatioa  point)  to  the 
line  was  1.2  inches  or  1.0*. 

The  author  waa  tin  subject  for  most  of  the  experiments 
reported  In  this  paper,  line#  the  experiments  were  to  tedious 
that  it  was  diSrult  to  penusds  others  to  takt  part.  However, 
the  services  of  tero  other  subjects  wen  obtained  for  a  short 
time  in  order  to  verify  that  the  data  reported  hen  is  similar  for 
different  people. 

/megs  yea* ruling.  When  I  first  gene  rated  images  of  dotted 
lines  embedded  in  s  sunound  of  random  dots,  (  noted  that 
the  dote  in  the  surround  often  formed  spurious  groups  (dusters 
and  line.),  simply  due  to  random  proximity  of  dots.  This  often 
proesd  distracting  when  trying  to  detect  the  dotted  lines  the  ra¬ 
ts  Ives.  To  seoid  this  problem,  1  uoed  a  method  that  generate* 
more  uniform  (pseudo-random)  patterns  of  dots. 

The  method  ssorka  at  follows.  The  algorithm  is  given  in 
average  separation  i.  for  the  dou  in  the  surround.  The  nom¬ 
inal  position  for  each  dot  is  st  the  intersection  of  grid  lines  4, 
units  spirt.  Such  a  pattern  is  completely  regular.  The  position 
of  erch  dot  is  then  perturbed  by  independent  amounts  in  the 
vertical  and  horisostal  directions.  The  maximum  amount  o' 
perturbation  is  specified  by  the  maximum  perturbation  is 
O.CRd,v„  where  R  it  some  random  number  in  [-1,  Ij.  Anc  her 
wiy  of  looking  at  this  is  that  the  dots  are  placed  at  random  in  a 
square  with  side  d,v,.  When  v,  is  tero,  the  dots  srs  cor.pieteiy 
uniform;  as  v,  increases,  the  pattern  look*  more  “random.* 
When  v,  >  1  the  dots  may  touch,  ro  for  these  erp  riments  v, 
was  lets  than  I.  When  v,  is  nearly  1,  the  patterns  generated 
appear  quite  random,  as  evidenced  by  the  surround  of  Figure  3. 

Note  that  v,  was  essentially  an  artifact  of  tbs  process  that 
generated  the  random  surround;  1  did  not  consider  it  to  be  s 
parameter  of  the  experiment.  Therefore  it  was  left  constant  at 
0.8  in  these  experiments. 

There  is  another  subtlety  in  generating  these  images. 
When  v  dotted  line  is  embedded  in  a  surround  of  dots,  it  is 
possible  for  the  dots  in  'ha  line  to  fall  dose  to  dots  in  the 
surround.  This  creates  local  clusters  of  dots,  and  may  trigger 
processes  that  we  don’t  wish  to  study  in  thir  experiment.  To 
avoid  this  problem,  we  need  to  make  sure  that  there  arc  no  dots 
from  the  surround  in  the  immediate  vicinity  of  the  dots  in  the 
line.  To  be  precise,  no  dot  should  fall  within  a  radius  of  0.5 d, 
of  any  dot  in  the  line.  When  generating  images,  we  can  either 
generate  the  surround  dots  first,  then  the  dots  in  the  line,  and 


Figure  2.  A  typical  circular  are  chord  length  is  L  sod 
height  is  U. 


erase  nearby  doe/.;  or  alee  general*  dote  in  the  line  first,  then 
dote  ia  the  tu/reaisi,  end  snake  sore  nos  to  pus  in  the  image 
any  surround  dot  that  is  too  close  to  s  fine  dot.  The  latter 
approach  eu  taken  ia  thee*  experiments. 

Experiments  mad  Basalts 

I  was  interested  in  finding  the  effect  os  all  the  parameters 
r;  the  lines  upon  detectability.  The  experiments  reported 
Sere  attempted  to  isolate  a  single  parameter  and  tested 
to  see  how  subjects’  performance  varied,  as  measured  by 
their  accuracy  in  guessing  the  position  of  the  target.  Each 
of  the  experiment*  tested  the  effect  of  one  parametei  on 
detection  ability. 

I  performed  two  primary  experiment*  and  several  sub¬ 
sidiary  experiment*.  The  main  experiment*  sought  to  de¬ 
termine  the  effect  of  deviation  from  a  straight  line,  and 
the  effect  of  curvature,  on  detection  performance.  In  the 
former,  a  eries  of  trial*  wa*  ,  rented  in  which  the  degree 
of  alignment  of  the  dot*  in  the  line  varied;  that  i»,  tiy  var¬ 
ied  (recall  ‘  at  a*  uy  increase*,  *o  doM  the  “jaggedness* 
of  the  line).  As  expected,  performance  dropped  off  as  the 
deviation  increased  (details  will  be  presented  below).  The 
trials  were  randomised  so  that  the  amount  or  deviation 
od  a  particular  trial  could  not  be  predicted.  In  the  latter 
experiment,  a  series  of  trial*  wa*  presented  in  which  dot¬ 
ted  arc*  of  different  curvature*  appeared.  The  trial*  were 
again  randomised  so  that  the  shape  of  the  target  could  not 
be  predicted  Again  a*  expected,  performance  dropped  off 
as  curvature  increased. 

Ia  the  course  of  performing  'he  experiments  on  arc 
detection,  I  noted  that  it  was  much  easier  to  detect  an  arc 
when  the  shape  was  known  in  advance.  That  is,  if  the 
arcs  presented  in  a  series  of  trials  all  had  the  same  L  and 


w*mwx  m  ,i»j 


B,  they  ill  bad  tbe  tune  shape.  !n  this  situation.  it  vaa 
possible  to  infer  tbe  preaence of  tbe  arc  (ran  tbe  preaeace  of 
only  a  few  dota  ia  tbe  right  positions.  On  tbe  other  band, 
when  area  of  several  different  ihaj>es  were  intermixed  in 
tbe  aeries  of  trials,  they  became  more  difficult  to  detect; 
i.e.,  tbe  dota  in  tbe  arc  needed  to  be  closer  together  than 
in  tbe  former  case.  Since  I  was  interested  in  detection 
of  structure  without  knowing  tbe  si  ape  of  that  structure 
a  prion,  tbe  shapes  of  tbe  area  wert  intermixed  ia  these 
experiments. 

The  subsidiary  experiments  sought  to  determine  boss 
variables  like  length  L  and  longitudinal  variability  af¬ 
fected  performance.  These  experiments  received  leas  em¬ 
phasis,  in  tenns  of  both  experimental  tins  and  prominence 
in  this  paper,  because  they  did  not  sees  to  be  very  impor¬ 
tant  in  determining  our  threshold. 

Initial  experiments  snowed  that  it  is  possible  to  detset, 
with  nearly  100%  accuracy,  a  straight-line  target  in  »  ran¬ 
dom  surround  approximately  when  d,  <  d,,  if  *r  was  close 
to  rero  (i.e.,  <  0.2).  That  ia,  straight  dotted  linea  can 

be  detected  when  tbe  dots  ,-f  which  they  are  composed  are 
closer  together  than  those  in  the  surround.  (The  reader 
may  wish  to  look  at  Figures  3(a)  and  3(b)  and  compare 
tbe  ease  with  which  be  or  she  can  detect  the  dotted  lines.) 
In  performing  tbe  experiments,  1  tried  to  see  how  altering 
some  of  the  parameters  affected  performance.  However, 
it  was  necessary  to  drat  set  tbe  other  parameters  so  that 
the  resulting  target  was  near  a  threshold  of  detectability. 
That  is,  if  d.  =  20  and  d,  -  10,  the  dots  in  the  target  are 
very  close  compared  to  those  in  the  surround,  and  vai.  isg 
another  parameter  such  as  H  or  i/£  has  very  little  effect 
on  performance;  it  will  be  close  to  100%.  However,  if  we 
let  d,  =  20  and  d,  =  20  also,  we  are  near  the  threshold 
for  detection,  so  the  effects  of  different  H  or  can  be 
studied. 

I  will  discuss  the  short,  subsidiary  experiments  first. 
Then  1  will  discuss  the  effects  of  transverse  variation,  fol¬ 
lowed  by  the  effects  of  curvature,  on  performance. 

Line  length 

I  tested  tbe  detectability  of  lines  of  ditferent  L,  for 
cases  above  and  near  tbe  threshold  of  detectability.  Some 
typical  results  are  shown  in  Figure  4(a).  Figure  4(b)  plots 
performance  as  a  function  of  the  number  of  dota  in  tbe  line, 
[L/dt\.  Note  that  the  target  can  generally  be  detected 
when  there  are  more  than  5  or  6  dots  in  the  line,  if  it  can 
be  detected  at  all  (i.e.,  if  a  long  t-  get  is  above  threshold). 
Moreover,  accuracy  does  not  seem  to  increase  (once  past 
the  threshold)  as  the  line  length  increases.  This  parameter, 
the  a,  does  not  seem  to  be  an  important  one  for  detection. 

Longitudinal  variation 

Recall  that  this  parameter  vi  controls  the  variation  of 
dot  position  along  the  lire;  as  vi  increases  the  dots  appear 
more  irreguiarly-spaced.  I  tested  the  effect  of  increasing 
variation  on  accuracy  of  detection.  Some  results  of  this 
experiment  are  shown  in  Figure  5.  This  parameter  does 
not  seem  to  be  crucial  in  determining  detectability  either. 


__  Ac_i 


—> — ' — ' - ’ - ■ —  "T>  1 


Figure  S.  Comparison  of  difficulty  of  detecting  straight-line 
petteme.  In  (a),  the  parameters  are:  L  =  200,  d ,  =  20, 

-  IS,  H  =  0,  v,  =  0.5,  vr  —  0  and  vt  =  0.  lu  (b)  the  only 
difference  it  that  d<  —  22;  the  target  ia  difficult  to  detect. 

Figure  S  shows  that  has  a  slight  effect  on  performance 
when  the  lice  is  near  the  threshold  of  detectability,  but  in 
general  it  is  safe  to  say  that  longitudinal  variation  does  not 
matter  much. 

IVnnsverse  variation 

Ij  this  experiment,  the  first  of  the  primary  experi¬ 
ments,  I  tested  the  effect  of  transverse  variation  vy  on 
accuracy  of  detection.  These  experiments  were  conducted 
with  straight  lines  ody;  that  is,  H  —  0.  Starting  with 
choices  for  the  parameters  that  resulted  in  above-tbreshold 
targets,  the  transverse  variation  was  increased  until  perfor¬ 
mance  dropped  to  near  tbe  chance  level  (25%).  This  was 


a. 


a 


1  r  , 


Ait 


Figure  4.  (a)  Performance  a>  a  function  of  target  length  L. 
(b)  Performance  ae  a  function  of  number  of  dote  in  the  line. 

repeated  for  various  value*  of  d,  and  L  and  t/g  were 
kept  constant  at  200  and  0.8,  respectively,  since  tho*e  two 
parameters  didn’t  seem  to  make  much  difference  ip.  the  ac- 
cv*acy.  Some  typical  results  of  this  experiment  are  shown 
in  Figure  6.  Note  that  accuracy  typically  starts  off  at  a 
plateau  (100%  if  dt  <  dt),  then  drops  a*  the  threshold  i* 
crossed,  and  finally  levels  off  at  the  chance  level. 

I  asked  what  is  the  maximum  t>p  that  allowed  detec¬ 
tion,  for  a  particular  setting  of  L,  d„  dt  etc.  I  chose  a 
cutoff  point  of  80%  accuracy  to  mark  where  performance 
dropped  off,  i.e.,  the  threshold.  For  each  setting  of  the 
parameters,  then,  1  fouud  the  maximum  vp  and  plotted 


•  ICC  CM  «M  1C*  ICC  ICC  ICC  ICC  ICC 

rrtnsrcoc  Vcriatlcc 

Figure  A.  Typical  results:  Performance  as  a  function  of 
transverse  variation  vp. 

the  results  in  Figure  7.  Note  that  for  constant  d,,  as  dt 
increases,  the  maximum  tolerable  variation  decreases;  that 
is,  as  the  distance  between  the  dots  in  the  target  increases 
relative  to  the  distance  between  dots  in  the  surround,  less 
jaggedness  can  be  tolerated.  This  is  in  accord  with  our 
intuition. 

Consider  the  relative  separation  of  the  dots  in  the  tar¬ 
get  compared  with  those  in  the  surround,  dt/d,.  If  we  plot 
maximum  variation  vp  versus  relative  separation,  we  get 
a  good  fit  to  a  straight  line,  as  shown  in  Figure  8.  These 


\ 


Figure  7.  Mix i  mum  traniTfm  nnilion  «r  M  »  function  of 
dot  (operation  it,  for  various  constant  i,. 


“T 

H- 

i 


Figure  I.  Maximum  traomrM  variation  «r  M  a  function 
of  relative  separation  i,/i..  t1m  reault  b  linear  to  a  good 
approximation;  tb  slope  of  the  rvgraaaion  line  b  -1.46  and 
the  comlation  coefficient  b  -0.91. 


reault*  indicate  that  it  ia  only  the  relative  separation,  no* 
the  absolute  separation,  between  the  dot*  iff  the  target 
that  r.ffetta  performance. 

Height  of  are  (are  curvature ) 

In  this  experiment,  the  second  of  the  primary  exper¬ 
iments,  I  tested  the  effect  of  arc  height,  or  curvature,  on 
performance.  I  presented  a  varietv  of  arcs  of  differer.t  cur¬ 
vature  to  the  subject,  each  with  the  same  L,  dt,  i,  and 
variations  v.  I  was  interested  in  how  performance  varied 
with  curvature.  Recall  that  in  this  experiment,  the  curva¬ 
tures  were  randomized,  hence  the  subject  c  •’  1  not  predict 
the  shape  of  the  target;  so  ihe  thresholds  fot  ife'.ection  are 
somewhat  higher  than  in  the  previous  experiments.  They 
are  also  more  general  in  the  sense  that  we  gonerelW  have 
no  prior  knowledge  of  the  shapes  we  will  nerd  to  organise. 
Some  typical  raw  results  of  this  experimer.t  are  shown  in 
Figure  9. 

I  again  used  a  criterion  of  80%  for  the  threshold  of  de¬ 
tection,  and  obtained  the  maximum  U  lot  each  parameter 
setting;  the  results  are  plotted  in  Figu-  e  10.  I  again  deter¬ 
mined  the  relative  separation  of  dots  ia  the  arc  and  plotted 
maximum  H  versus  d, /d,  in  Figure  11(a).  The  curvature 
k  of  an  arc  with  endpoint  separation  L  ard  height  H  is 

2H 

K  ~  (Lny  -i  h 2 


The  maximum  curvature  b  plotted  as  a  function  of  relative 
target  separation  in  Figure  11(b).  Note  that  the  result  is 
again  linear  to  a  fairly  good  approximation  (correlation 
coefficient  is  -0.72).  This  again  indicates  that  it  is  the 
relative  separation,  rather  than  absolute  distance,  that  b 
important  in  detecting  these  k.ads  of  dotted  targets. 

We  can  look  at  the  results  of  Ibis  experiment  another 
way.  For  a  given  a.-c  height  \or,  equivalently,  arc  curva¬ 
ture),  what  is  the  lirthest  apart  the  dots  in  the  arc  can 
be  for  the  arc  to  be  o  -tecled?  That  is,  what  is  the  maxi¬ 
mum  d,  f  '  given  H"  I  again  used  60%  as  the  threshold 
for  detectu.  urd  obtained  the  results  in  Figure  12(a), 
which  shows  1. 1  maxirr  im  it  as  a  function  of  arc  height 
H,  for  various  surroun;  separations  d,.  Note  that  there 
is  a  point  past  which  increasing  the  height  has  little  effect 
on  the  maximum  <U,  In  Figure  12(b)  I  have  plotted  the 
maximum  relative  separation  <U/t i,  as  a  function  of  arc 
curvature.  We  nee  the  same  trend  here:  once  curvature 
increases  past  u  certain  p^int,  the  maximum  separation 
does  r.ot  cLange  much.  It  appears  that  curvature  ban  an 
increasingly  detrimental  effect  on  detectability  when  the 
curvature  is  small,  but  the  effect  stabilises  when  the  cur¬ 
vature  is  large. 

Conclusions 

These  experiments  indicate  that,  to  a  good  approxi¬ 
mation,  the  length  of  the  target,  or  number  of  dots,  is  not 
an  important  parameter  in  detection  perform:.nce,  as  long 
as  there  are  at  least  half  a  dosen  dots  in  the  target.  Nor  is 
the  regularity  of  dot  spacing  along  the  line  an  important 
factor. 


* 


KaMiM  0*t  l>Mfnt»  Dt/D* 


FI  fur*  9.  Typical  results:  accuracy  of  dvtacUoo  of  area  u  a 
function  of  arc  hci|ht  if.  Cur***  taro  constant  ft  wf  ft-  Ail 
area  tiar*  f,  =  2fl0,  *r  =  »i  =  0. 


'Jr*  Dot  S*p«r«ttoA  Dt 


Figure  10.  Maximum  H  a*  a  function  of  dot  separation  d(. 
Currea  have  constant  d-.  (Th*  upswing  at  t.a  end  of  two  of 
the  curve*  it  probably  c  to  noise  in  the  data) 

On  the  other  hand,  ja^gedeesa,  or  deviation  from  a 
atraight  iine,  (as  measured  b,  y)  Ls  an  important  deter¬ 
miner  of  performance;  a*  the  amount  of  transverse  varia¬ 
tion  increases,  performance  dropa  oF.  The  reaulte  of  these 
experiments  show  that  the  amount  of  jaggedaes*  we  can 
tolerate  falls  oil  linearly  with  the  separation  ratio.  Perfor¬ 
mance  at  the  detection  task  (for  straight  lines)  w*mi  to 


D*t  Dt/Ol 


Flgur*  11.  (a)  Maximum  H  as  a  function  of  relative  dot  sep¬ 
aration  di/d..  Fitted  line  has  a  atop*  o'  -116  and  a  corre¬ 
lation  cf  -0.72.  (h)  Maximum  curvature  a  aa  a  function  of 
d,/d,.  (The  curvature  va'ues  plotted  are  1000  times  the  com¬ 
puted  value,  for  ease  of  reading.)  This  look*  very  similar  to 
(a)  because  curvatu-e  aa  »  function  of  H  is  nearly  linear  ‘a 
the  range  (0,  50). 

depend  only  on  the  ratio  of  the  spacing  between  the  uota 
in  the  target  to  the  averr.ge  spacing  between  dots  in  the 
surround.  This  suggests  that  the  surround  is  taken  into 
account  by  some  sort  of  local  difference  mechanism,  which 
has  yet  to  be  elucidated. 

Curvature  plays  an  important  role  in  detectability 
as  well.  These  experiments  confirm  the  intuition  that 
curved  dotted  lines  should  be  more  difficult  to  perceive 
than  straight  ones,  that  is,  the  dots  in  the  line  need  to 


J  j6 


Passible  Mecbaniama 

I  shall  now  speculate  on  mechanism*  that  might  be  able 
to  produce  behavior  similar  to  that  found  in  these  experi¬ 
ments,  both  is  computers  and  in  biological  vision  systems. 
I  will  present  an  outline  of  a  mechanism  that  shows  promise 
of  solving  the  problem  cf  detecting  the  kinds  of  targets  con¬ 
sidered  here,  a*  well  a*  two  other  related,  and  important, 
problems  in  early  vision. 

Tt»o  related  problems 

Consider  the  randoa-det  display  in  Figure  13.  The 
imprest  ion  of  bilateral  symmetry  ij  immediate,  and  is  in 
fact  p re- attentive  (see  Barlow  and  Reeves  (1979)  and  Bar- 
low  (1980)).  Detecting  symmetry  of  shapes  and  patterns  is 
a  problem  of  obvious  importance  for  animals,  and  is  one  of 
practical  importance  for  computer  vision  systems  as  well. 
The  problem  of  finding  parallel  lines  (Figure  14(a))  and 
parallel  curves  (Figure  14(b))  is  closely  related.  A  single 
mechanism  should  be  able  to  find  all  these  kinds  of  sym¬ 
metry. 


w 


Now  look  at  the  image  in  Figure  IS.  This  picture 
creates  a  strong  impression  of  two  different  textures.  The 
two  sides  have  identical  dot  densities  but  a  different  rule 
for  placing  the  dots.  The  dots  in  one  texture  are  "more 
random”  than  those  in  the  other  texture.  The  doti  on 
each  side  were  in  fact  generated  by  the  algorithm  described 
early  in  this  paper  (see  DtUult),  with  v,  different  for  the 
two  sides.  I  tested  Low  well  subjects  ern  discriminate  two 
textures  with  regularity  differences  such  as  this,  and  the 
results  show  ‘.hat,  as  expected,  when  the  variations  in  reg¬ 
ularity  are  qui.e  different,  the  textures  are  easy  to  disrwm- 
inate,  and  for  those  that  are  more  similar,  the  discrimina¬ 
tion  become*  more  difficult.  The  question  is,  what  kind  of 
mechanism  could  detect  texture  differences  such  as  this? 


6 


4  I 

•ft  Ctirwftn *% 


Figure  13.  (a)  Maximum  jtparati'-a  of  dots  in  a/:  i,  e  a 
function  of  arc  height  H.  Curves  have  constant  d,.  (b)  Max¬ 
imum  relative  separation  d,/d,  of  dots  in  arc  as  a  function 
of  are  curvature.  Note  that  the  curve  levels  off.  (Curvatur* 
shown  is  again  actually  1000  times  actual  curvature.) 

be  closer  together  relative  to  the  surround.  However,  the 
experiments  show  that  the  effects  of  curvature  level  off  af. 
ter  a  point,  and  that  turther  curvature  increases  do  not 
necessitate  decreasing  the  dot  distance  in  the  arc. 


k 


Some  related  results 

Caelli  (1981;  see  also  Caelli  ei  of,  1978)  has  noted 
that  human  beings  have  the  ability  to  segment  textures 
*bat  differ  in  their  ^-variability,  that  is,  textures  composed 
of  line  segments  whose  orientations  fall  in  a  larger  range  in 
one  region  than  in  another.  For  example,  in  Figure  1 6,  the 
left  side  has  line  segments  whose  orientations  are  60  ±  10° 
while  those  on  the  right  side  have  orientations  of  60  ± 
.10°.  Thus  they  Lave  the  same  average  orientation  but  a 
different  variability  or  range.  (Caelli’s  experiments  used 
dipoles,  or  dot  pairs.  The  results  are  similar.)  We  can 
aL  o  segment  regions  of  small  ^-variability  from  regions  of 
random  orientation,  as  shown  in  Figure  17. 

In  fact,  we  can  segment  regions  of  just  two,  or  c/en 
three,  fixed  orientations  from  a  region  of  random  orienta¬ 
tions.  as  shown  in  Figure  18.  This  is  in  contradiction  with 
*he  results  of  Riley  (1981),  who  claimed  that  two  fixed 
orientatious  canDct  be  segmented  from  random  orienta¬ 
tions.  However,  his  demonstration  did  not  allow  a  large 
enough  area  on  wb:ch  to  base  our  comparisons.  That  is, 
there  was  not  enough  information  in  the  picture  to  enable 
us  to  make  r.  segmentation  decision.  I  have  performed 


V  lt<V 


I 


'/igur*  1*.  A  bilaterally  i;miMthe  pattern. 


Figure  14.  (»)  Parallel  line*;  (b);  parallel  curvet. 


some  informal  experiments  that  show  that  as  the  width  of 
the  two-orientation  reg'on  decreases,  the  segmentation  be- 
cornea  more  difficult.  Riley’s  demonstration  showed  that 
the  segmentation  is  «*ry  difficult  when  the  width  is  email. 

Proposal 

Suppose,  then,  that  virtual  line  segments  are  con¬ 
structed  in  the  image  between  each  dot  and  some  3inal! 
number  of  its  nearest  reighbors.  I-et  the  “weight*  of  each 
segment  be  inversely  related  (in  some  as-ynt  unspecified 
way)  to  the  distance  between  the  dots  at  its  ends.  Then 
at  each  dot  location  in  the  image  there  will  be  a  number 
of  oriented  virtual  line  segments,  come  of  them  weigt  ted 


Figure  IS.  The  texture  difference  in  this  image  is  brought 
about  by  a  difference  in  the  regularity  of  placement  of  the 
dots  on  the  two  tides.  On  the  left,  the  variatkr.  •  =  0.4;  on 
the  right  it  it  V  -  O.i.  The  right  tide  iookt  more 


Figure  18.  A  difference  :n  orientation  variability  creates  a 
texture  difference.  The  mean  orientation  is  the  same  on  both 
sides  (00°),  Dut  on  tile  left  it  varies  by  ±10  while  on  the 
right  it  varirs  jy  ±3J°. 


W-.' 


Figure  IT.  Lin*  segments  of  »  single  orientation,  with  smsll 
eansbility,  can  be  segmented  from  random  orientation*.  The 
segments  on  tke  left  are  oriented  at  40  ±  IS*;  on  the  right  they 
are  randomly  oriented. 


Figure  li.  A  region  composed  of  line*  with  just  two  orienta¬ 
tions  can  be  segmented  from  a  region  composed  of  rar—omly. 

oriented  lines.  On  the  left  the  orientations  are  40  and  80  de¬ 
grees. 


more  than  the  others.  Suppose  aow  that  the  linage  consists 
of  a  linear  string  of  dots,  with  ao  surround.  Then  all  the 
virtual  line  segments  will  be  ahgned  with  this  linear  fea¬ 
ture.  If  we  now  surround  the  dotted  'ine  with  random  dots 
(of  sufficient  sparseness),  the  predominant  orientations  of 
the  virtual  line  segments  will  still  be  along  the  line.  It 
is  clear  that  as  the  distance  between  the  dots  in  the  sur- 
rcund  gets  smaller  and  approaches  the  distance  between 
the  dots  in  the  line,  the  orientations  of  the  virtual  lines 
become  more  and  more  random  since  those  orientations 
are  influenced  to  a  greater  degree  by  nearby  random  dots. 
If  we  could  segment  regions  of  small  variability  in  orien¬ 
tation  from  regions  of  large  variability,  we  could  perhaps 
detect  the  target. 

To  see  whether  this  approach  is  useful,  I  performed 
some  simulations  of  the  process  of  forming  virtual  line*  on 
images  of  dotted  lines  embedded  in  a  random  surround. 
The  weighting  function  1  used  for  these  simulations  was 
u/(r)  =  t~nT ,  as  suggested  by  Caelli  cf  ai  (1978).  The 
value  of  a  was  0.05  (pixels'1).  Two  ways  to  process  the 
results  of  this  virtual-line  finding  process  come  to  mind: 
the  first  is  to  use  all  the  virtual  line  segments  that  are  gen¬ 
erated,  and  the  second  is  to  form  a  weighted  average  of 
the  orientations.  In  the  second  approach,  it  is  necessary  to 
combine  the  orientations  of  line  segments  with  orientations 
9  and  9  +  180;  this  can  be  done  by  the  trick  (see  Caelli  et 
ai  (1978)  and  Kass  and  Witkin  (1985))  of  first  multiplying 
the  angle  by  two,  then  adding  the  resulting  vectors,  and  fi¬ 
nally  dividing  the  angle  of  the  vector  sum  by  two  to  obtain 
the  militant  virtual  line.  I  tried  both  of  these  approaches, 
and  present  some  i  .-suits  in  Figures  19and  20.  Note  that 
when  the  dots  in  the  line  are  close  together  compared  with 
those  >r.  ibe  surround,  as  in  Figure  19,  the  predominant 
orientation  of  virtual  segments  in  the  line  is  alorg  the  line, 
while  in  the  surround  it  is  mostly  random.  Compare  this 
to  Figure  20,  where  dt  is  abou.  the  same  as  d,;  the  exper¬ 
iments  reported  earlier  show  that  it  is  difficult  for  people 
to  detect  the  line,  and  the  orientations  indeed  appear  to 
be  mostly  random  along  the  line. 

How  could  this  field  of  oriented  line  segments  be  used 
to  detect  linear  features  (or  arcs,  for  that  matter)?  A 
mechanism  that  separates  regions  of  small  variation  from 
regions  of  large  variation  nould  be  required.  The  experi¬ 
ments  of  Caelli  noted  above  show  that  people  cau  in  fact 
perform  this  task.  To  see  how  this  might  be  performed 
computationally,  consider  sereral  operators  corresponding 
to  each  dot  location  in  the  image,  each  operator  aligned 
with  a  d  irerent  orientation  (e.g.,  an  operator  aligned  every 
15°).  Each  operator  responds  to  dot  pairs  in  a  small  range 
of  orientations,  and  its  output  varies  inversely  with  the 
distance  between  the  dots,  "i  he  reader  may  visualiie  this 
situation  as  a  ret  of  planes  aligned  with  the  image,  with  all 
the  operators  ia  each  plane  aligned  in  the  same  direction. 
(This  bears  a  certain  resemblance,  not  coincidentally,  with 
the  weli-known  orientation  columns  that  Hubei  and  Weisel 
discovered  ia  the  visual  cortex  of  mammalian  brains;  see, 
for  example,  Hubei  and  Weisel  (19G8),  Hubei  et  ai  ( 1078), 
and  Hubei  (1979).) 


**■  *  #  , 

;  A*  w  .V 

t  ;*  ^  * 

4  ^  '4"*-  ■%.  "H. 

»  *+  *  4 

,  ♦  •  4  # 

-I.  +  * 


^  -w-4- 


♦  * 


% ; 


*  ♦ 


4  4  * 

''+*#* 


K  ,/  - 


c  ' 


\  ' - v 


Figure  10.  Si  mutation  of  the  virtual-line  finding  process,  (a) 
The  original  image.  Parameters  are:  d,  =  12,  d.  --  20.  (b) 
Oriented  virtual  lines  are  formed  at  each  dot  location  be¬ 
tween  that  dot  and  its  !0  nearest  neighbors.  The  weight  of 
each  virtual  line  is  represented  here  by  its  length,  and  is  in¬ 
versely  related  to  tht  distance  between  the  dots  from  which  it 
was  formed,  (c)  The  weighted  average  of  all  the  virtual  line 
segments  is  displayed  for  each  dot  position.  The  predominant 
orientation  of  segments  along  the  line  is  with  the  line. 


Now,  a  set  of  randomly-oriented  line*  in  the  image 
will  produce  output  from  a  corresponding  random  set  of 
operators.  But  a  region  of  constant,  or  slightly  varying, 
orientations  sarrounded  by  a  region  of  random  orientations 
will  produce  output  in  a  region  of  operator*  on  one  plane, 
with  random  output  in  the  other  plane*.  The  problem  then 
reduce*  to  one  of  finding  a  region  of  differing  density  of  fea¬ 
tures,  that  is,  finding  a  cluster.  This  problem  is  somewhat 
similar  to  the  problem  of  finding  intensity  edges  in  images, 
except  that  the  data  points  are  muck  more  sparse.  I  am 
currently  designing  an  algorithm  to  perform  this  clustering 
process. 

So  it  appears  that  the  problem  of  finding  linear  strings 
of  dots  in  noisy  surround*  can  be  solved  by  the  method  of 
virtual  line  orientations.  What  about  the  other  iwo  related 
problems  posed  above?  Let  us  consider  the  problem  of  dis¬ 
criminating  two  textures  that  differ  in  the  regularity  of  dot 
placement.  If  virtual  lines  ar*  farmed  between  nearby  dots 
on  each  side,  then  the  orient,  tions  of  the  dots  on  the  more 
regular  side  will  tend  to  line  up  in  two  dominant  direc¬ 
tions  (in  this  case,  with  the  horisontal  and  vertical  axes); 
those  on  the  o.her  side  will  be  more  random.  There  will  be 
two  predominant  orientations  in  the  more  regular  side,  and 
none  in  the  random  side.  Two  regions  in  two  orientation 
planes  will  produce  output,  while  the  rest  will  be  random. 
Recall  the  demorjtration  in  Figure  18  that  showed  that  it 
is  possible  to  segment  a  texture  composed  of  lines  at  two 
orientations  from  one  with  lines  at  random  orientations. 
The  orientation-column  approach  outlined  above  should 
be  sufficient  to  perform  this  task  as  >vell.  Thus,  this  mech¬ 
anism  seems  to  be  sufficient  to  detec.  regularity  differences 
such  as  the  one  demonstrated  here. 

The  mechanism  may  also  be  able  to  detect  symmetry 
in  patterns.  At  first  glance,  one  might  expect  that  there 
would  be  a  predominant  orientation  across  the  axis  of  sym¬ 
metry,  corresponding  to  symmetric  pairs  of  dots.  However, 
the  simple  one-level  mechanism  presented  here  is  not  suffi¬ 
cient  to  detect  mesi  instances  of  symmetry.  For  example, 
in  Figure  13,  if  we  apv  y  the  virtual-line  forming  process, 
only  the  dots  near  -he  axis  of  symmetry  are  close  enough 
to  their  symmetric  mates  for  a  virtu?,  line  of  significant 
weight  to  be  formed.  In  most  places,  the  virtual  orienta¬ 
tions  are  random  and  cannot  help  us  find  the  rymmetry. 
However,  I  believe  that  the  mechanism  can  still  be  used. 
If  higher-level  features,  such  as  short  line  segments  and 
rluctera,  are  formed  on  each  ride,  and  the  original  do*s  re¬ 
moved,  then  the  algorithm  can  be  applied  to  the  positions 
of  these  features.  If  appropriate  features  ,.r:  Losen  so  that 
the  density  of  features  is  smaller  at  t.i_  nigher  level,  then 
the  nearest  neighbors  of  features  will  include  their  sym¬ 
metric  mates.  There  will  then  be  a  peak  iu  orientation  of 
the  virtual  line  segments  across  the  axis  of  symmetry;  this 
should  enable  the  detection  of  the  symmetry. 


Conclusions 


* 

X 

*  * 

#  ^  *  **  i  + 

ic  *  Mjtr  ~ 

*  *  #  dH* 

*  *  *  *  *  * 

* 

*  * 

. 

*  jf  *  ♦  * 

* 

♦  *  ♦  ■*+ 

♦  ♦ 

* 

jff.  *  ♦ 

*  if 

* 

%  * 

* 

**  * 

^  -a.  % 

+  *  K. 

X 

i  # 

,  -  \ 

♦ 

/ 

✓ 

• 

\  y  ‘  ' 

/  • 

-  _  " 

/ 

4 

c  \ 

/ 

.  / 

",  - 

\ 

.  '  '  v. 

'  \ 

' 

/  *  '  '* 

X  / 

/ 

<  - 

/  '  - 

\ 

✓/ 

'  X  ^  ^ 

'  \ 

/ 

Figure  30.  Another  simulatiou  of  the  virtual  line-finding 
process,  this  time  with  parameters  d,  =  d,  =  20.  The  vir¬ 
tual  orientations  are  essentially  random. 


I  ha  -e  presented  the  result*  of  several  psychophysical  ex¬ 
periments  in  preattentive  vbion.  The  experiments  sought 
to  determine  the  important  parameters  that  affect  the  de¬ 
tectability  of  dotted  lines  and  dot»ed  arc?  •.vL'm  thev  are 
embedded  in  noisy  backgrounds.  Results  indicate  that  the 
length  of  the  target  is  not  important,  and  neither  is  the  reg¬ 
ularity  of  spacing  of  the  dots  along  the  target’s  axis.  The 
amount  of  deviation  frc  m  a  straight  line  doer  ffeci  perfor¬ 
mance,  as  we  intuitively  expect.  The  amount  01  curvature 
in  an  arc  also  affects  performance.  The  results  indicate 
that  performance  seems  to  depend  on  the  ratio  of  the  repa¬ 
ration  of  the  dots  in  the  target  to  the  separation  of  tLe-'e  in 
the  noisy  surround,  rather  than  upon  either  independently. 
That  is,  detection  does  not  seem  to  be  diameter- limited  but 
rather  complexity-limited,  in  the  sense  of  Binford  (1983). 

I  made  some  speculations  on  a  mechanism  that  could 
perform  this  task  of  detecting  dotted  targets  in  a  noisy 
surround.  I  claimed  that  this  same  mechanism  might  also 
be  able  to  perform  the  task  of  finding  symmetry  and  paral¬ 
lelism  in  images,  as  well  as  segmenting  tc.rtures  that  differ 
in  the  spatial  regularity  of  iheir  elements.  This  work  is 
clearly  in  its  preliminary  stages,  but  shows  promise  for  fu¬ 
ture  work  in  machine  defection  of  patterns  such  as  those 
considered  here. 

Acknowledgemen  ti 

I  would  like  to  thank  my  advisor  Tom  Binford  for  his  help 
and  encouragement,  and  Brian  Wandell  for  b:s  useful  sug¬ 
gestions.  This  research  was  supported  by  ARPA  contract 
N00039-84-C-02U. 

References 

Barlaw,  B.B.  end  B.C.  Reeves  (1979).  ‘The  versatility 
and  absolute  efficiency  of  detecting  mirror  symmetry  in 
ranoom  dot  displays  *  V'bion  Research ,  Vol.  19,  783-763. 

Bar’ow,  H.U.  (1980).  “The  absolute  efficiency  of  percep¬ 
tual  decisions,*  °hil.  7Varu.  R.  Soe.  Land.  B  260,  71-82. 

Dir.ford,  T.O.  (1983).  ‘Figure/ground:  Segmentation 
and  aggregation,*  in  Braddick,  O.J.  and  A.C.  Sleigh  (eds.), 
Physical  and  Biological  Processing  of  Images.  New  York: 
Springer-Verlag. 

Caelli,  Terry  ( 1981 ).  Visual  Perception:  Theory  and  prac¬ 
tice  Oxford,  England:  Pcrgamon  Press. 

Caelli,  T.M.,  C.A.N.  Preston  and  E.R.  Howell 

(1978).  “Implications  of  spatial  summation  models  for  pro¬ 
cesses  ol  contour  perception:  A  geometric  perspective.*  Vi¬ 
sion  Res.  Vol.  I  i,  723-734. 

Hubei,  D.H.  (1979).  “The  visual  cortex  of  the  brain,* 
Scientific  American,  Nov.  1979. 

Hubei,  D.H.  and  T.N.  Wiesel  (19C8).  “Receptive  fields 
and  the  functional  architecture  of  monkey  striate  cortex," 
J.  Phystol.  (laind.j  105,  215-2-13. 


Hubei,  D.H.,  T.N.  Wiesel,  and  M.P.  Stryker.  (1978) 
'‘Anatomical  demonstration  ol  orientation  columns  in  Ma¬ 
caque  moDkev."  J.  Comp.  Scar.,  177,  361-C30. 

Rasa,  M.  and  A.  Witkin  (1985).  “Analyzing  oriented 

patterns*.  Pror.  I.lCM-85. 

Lowe,  O.  (1934).  “Perceptual  organization  and  visual 
recognition."  Ph.O.  Thesis,  Computer  Science  Dept.,  Stan¬ 
ford  University. 

Lowe,  D.  (1985).  “Visual  Recognition  from  Spatial  Corre¬ 
spondence  and  Perceptual  Organization."  Proc.  UCAI-85. 

Neisser,  U.  (1967).  Cognitive  Psychology.  New  York; 
Api  leton-Century-Crofts. 

Riley,  Michael  D.  (1931).  “The  representation  of  image 
texture."  MIT  AI  Lab  Memo  TR-649,  September,  1981. 
Master’s  Thesis. 

TViegman,  A.  ( 1985).  “Preattentive  processing  in  vision," 
Ccmp.  Via.,  Graphics,  and  Image  Proc.,  1-22. 

Witkin,  Andrew  P.  and  Jar  M.  Tenenhamn  (1C82). 
“On  the  role  of  structure  in  vision."  In  Jacob  Beck,  B. 
Hope,  and  A.  Rosenfeld  (eds.),  Human  and  Machine  Vi¬ 
sion.  New  York:  Academic  Press. 

Zneker,  Steven  W.  (1982).  "Computational  and  psy¬ 
chophysical  experiments  in  grouping:  Early  orientation  se¬ 
lection."  In  Jacob  Beck,  B.  Hope,  and  A.  Rocenfeld  (eds.), 
Human  and  Machine  Vision.  New  York:  Academic  Press. 
Zucker,  S.W.  (198.1)  “Cooperative  grouping  and  early 
orientation  selection,"  in  Braddick,  O.J.  and  A.C.  Sleigh 
(eds.),  Physical  and  Biological  Processing  of  Images.  New 
York:  Springer- Vtrlag. 


362 


ONE-EYED  STEREO:  A  GENERAL  APPROACH  TO  MODELING 

3-D  SCENE  GEOMETRY 


Thomas  M  Sirat  and  Martin  A  I h  1» *r  * 

Artificial  lnt<  ihge.n  «*  <Yni«*r 
SRI  International.  Mn!o  Park,  California 

October  23  10H5 


Abstract 

\  -ingle  two-dimensional  image  is  an  ambitious  representa¬ 
tion  of  i  hr  i  hree-dimcnsionH  world  menv  different  scenes  rouhl 
haw  produced  the  same  image-  yet  the  human  visual  system  is 
rMruiul)  successful  at  recovering  a  qualitatively  rorreei  dep'.h 
riodi  I  from  this  type  of  representation.  Workers  in  'he  field  of 
computational  vision  have  devised  a  number  of  disline!  schemes 
that  attempt  to  emulate  this  human  capability;  these  schemes 
an*  collectively  known  as  ‘shape  from  ..  ."  methods  {e  g.,  shape 
from  ^hading,  shape  from  texture,  or  shape  from  contour).  In 
this  paper  we  contend  that  the  distinct  assumptions  mule  in 
each  of  these  schemes  must  be  tantamount  to  providing  a  sec¬ 
ond  (virtual)  image  of  the  original  scene,  and  that  any  our  of 
these  a/.proachcs  -an  be  translated  into  a  conventional  ster.-o 
fnrm.ih  m  In  particular,  we  show  that  it  is  frequently  possi¬ 
ble  it)  structure  the  problem  as  one  uf  recove-ang  denth  from 
a  “t rp’o  pair  consisting  of  the  supphrd  perspective  image  (the 
Mri/inu/  image)  and  an  hypothesized  orthographic  image  (the 
*  iriu'il  image).  vVe  present  a  new  algorithm  of  the  form  required 
to  accomplish  this  type  of  stereo  reconstruction  task. 


1  Introduction 


The  recovery  of  3-0  scene  geometry  from  one  or  more  nn* 
airev  which  we  will  call  the  wnr-modrling  problem  f-SMP).  has 
siihiiions  that  apj  f-IL  .*  ne  of  three  distinct  paradigms: 

st<reo:  optic  tkm.au'J  hading,  trx'u:.\  and  contour 


In  the  “  t  /■  -  -vr*  :  *•:.  Trip  finding  wrorid/serne 

points  m:  •  %  ;ve  geometry  of  the 

two  cameras  (<  •  !  we  cat;  use  simple 

t  r i‘Toiioinrf  rv  i  or  ‘  i i r*  ■  matched  points  [l] 

In  the  i  ;  'r  -  •  r  -ve  images  to 

compute  the  image  \«  .  pnu,fv  If 

the  camera’s  motirm  <■  .  wn.  wrean 

again  use  simple  it  r.i«  xiuremeni* 


ill  tiic  image  to  depths  in  the  '  ■  *  ** 

In  the  shape  fron:  shadm  •  »■  ur  t  S's  *  f ') 

paradigm,  we  must  either  jt  make  <  .ti .*  i  •  'imnt ions 

about  the  nature  of  the  vrenc.  the  ilhi'ninat ion,  and  the  imaging 
geon,etr>.  llrady's  1MH]  volume  on  romputer  vision  f'jj  r,.n- 
: a:ns  an  excellent  rejection  of  papers,  many  of  which  address 


The  *  <k  i  —  pf *r» ••  ■  i  herein  w \\  the  I >*  f-- ii «*»-  v  iwn'-i-i  IP-, 

se^r-  h  Pr"j*et«  \  verify  under  (’'>ntrici  \ll)V»-3  »'  C? 


the  problem  of  how  to  recover  depth  fr* >:n  h«  -i.  ..vv  i«-v- 
lure,  and  contour  informal  ion  visible  m  .t  .•  i»i  ( *.  |«  . 

distinct  compuMt luird  appr-  trhrs  fine  !.«-u  ■  i  .•  i  ,n  *  ■ , 
>S  Ti  *  paradigm  ( I )  ml*  'rat  i  n  .»f  pari  i  il  *bif«  <  u>  i  o’  • 
describing  the  relation  of  shading  in  an  iu.«  i  •  -  - » r ■ . « ■ 
try  in  a  scene,  and  (J)  bar  k- project  i<  n  '  f  pi  »:■  »r  on  » •  f )( «  t i  . 
undo  the  distortion  *n  an  image  attribute  f r  j  .  »  i  :<•  pm*  ui  at,  o  ; 
induced  by  the  imaging  process  on  an  i-  um«  I  -c»  ne  po»p«  rf  » 
(e  g  ,  uniform  distribution  of  edge  <>ririuai<  nu 

Our  purpose  iu  this  paper  is  to  provide  a  .iu:?y  in  *  fr  »nt<-w«-rh 
for  thr  scene  modeling  problem,  and  to  pr«---i  ■  (  iihh  .  -input  a- 
tional  approach  to  recovering  *c»-nr  geunn»r\  from  ih«  '•hadme 
texture,  and  contour  information  in  *sind<  on  ve  <  ,r  innin 
butiou  is  based  on  the  f  Homing  obv.-rv  at  i  n  r»  r.irdh  *  of  me 
sumptions  employed  in  thr  i  t  *  parvds  v-, ,  if  a  I  >  m  rfir 
motlel  has  been  derived  succrv^fnll)  .  it  will  -railv  I  e  po— nUr 
to  e*tabhsli  «  ’arge  mimber  of  enrresiMijid-  ur*  -  l»-n*>  m  mi.iee 
and  wene  (umd.l)  points  I’rom  tfiese  c«.rr»  p-tndi  nn-v  hi  can 
comptiic  a  colhneatioii  matrix  1!  .  vid  Huti  evtrirt  » !»e  imag¬ 
ing  geonnlry  from  t»  j  I;  Hii  Ar  can  now  c«>i|Niruri  a  seci.rtd 
image  of  the  scene  as  v  irwed  l>s  the  camera  from  some  arbitrary 
location  in  space  It  is  thus  olr.iuus  thr.f  any  ifchiiMpir  tlial 
is  rompiteiit  tf»  sfijve  t  lie  's\||'  must  eitfier  l«e  providetl  with 
at  least  two  m.i  gra  or  tnalir  assimipt ions  tb  it  «r*  'uuiv.ilent  t •  * 
provuling  a  second  imag*  We  ran  umfy  the  variou-  >\  i  i.  arhis 
to  the  SMI*  by  converting  tiiur  respect. »e  . ii-  »nd  aux¬ 

iliary  informal  ion  m.'o  the  implied  second  image  and  •mploying 
thr  stereo  paradigm  to  rt -ox iT  depth,  in  the  rise  ..f  ill*  s’l'i 
paradigm,  mir  approach  amounts  to  *otic-rvrd  stereo  ‘ 


2  Siiapo  from  Onc-Eyod  Stereo 

Most  people  viewing  f  igure  I  get  a  slr«.iiir  iuq»t  i-  n  <*f 
depth.  We  can  recover  an  rquivalenl  depth  rrod*  I  hv  a*  •*.  ■•imi" 
t  hat  we  are  v  irw  mg  a  proji-e  »».i,  of  a  uni  form  grid  .  ud  i  vi  |  '<  u  - 
thr  computational  procedure  to  be  descrttn-d.  i.,  *  be  rviounhr 
of  tins  paper  we  will  show  how  some  s.mpb*  iiio.Jj!,--.,. ;  .and 
\ .art al tops  of  the  uniform  grid,  as  the  lupb  d  mtou<|  nil.  p  al¬ 
low  ns  to  recover  ifep'ii  fr  *rn  vli.idiug.  t<  \inn  .  .ui/l  coulo  ir 

T!u*  i»|ie-eye#f  s|creu  p.ir  idi  'fll  call  be  ill  ■«  rd'i  d  .1“  live  *e,» 
process,  as  outlined  m  the  paragraohs  briow  'oum-  tl.  .  •/  nh 
special  surface  inarkings  .»r  image-format  ion  process  mu  t  be 
analyzed  by  variants  of  die  il . *ri » h tn  de  <  rib.  1  but  on  raJ 
approach  remains  the  omr 


i  .jure  I:  Wire  Room  ‘ 


2.1  Partition  the  Image 

\-  »iili  all  ,i|i|.ro»rhe.  to  the  SMI1,  ti.r  'm»ff  must  be  vr- 
nt>  itlr'l  min  prior  to  the  application  of  a  particular  altfo- 

ritimi .  f'.i  tr.rr  f  in1  ont  -c\eH  stereo  computation  can  be  employed, 
tin-  m  eimtii:iiii,u  proci  ss  must  dclinea'e  refctons  tha.  are  indi- 
Mrjii  lily  in  rotifurmaiire  with  a  single  moile'  of  tmagr  formation. 
I  In'  <  " m | >u t at i< m  ran  tb  n  be  carried  cut  independently  in  each 
rr;fi"ti.  uni  I  lit-  results  lilted  tofether. 

© 

2.2  Select  a  Model 

I  .»r  .•.*«•!?  region  identified  by  the  partitioning  procesa,  we 
nupi  dt  f  i. It*  upon  (hr  underlying  model  of  image  formation  that 
r\nl.un*  ih.it  portion  of  the  image.  Surface  reflectance  function! 
and  i-  \iur r  pattern*  are  examples  of  such  modela.  Partitioning 
<»f  thr  image  am)  selection  of  (he  appropriate  models  are  difficult 
ta-k*  i  fiat  arr  not  addressed  ia  this  paper.  Wit  kin  and  ka»  [23] 
are  exploring  a  new  class  of  technique*  (hat  promise!  tome  even- 
t u.J  iii-votv  in  i hesr  questions.  Generally,  it  will  be  impossible 
in  mrtrr  depth  whenever  a  single  model  cannot  be  asaoeiated 
with  a  fr  u* in  Similarly,  inaccurate  or  incorrect  results  can  be 
»■  \ (»«■«  > « d  d  i  hr  partitioning  or  modeling  is  performed  incorrectly. 

2.3  Generate  the  Virtual  Image 

I  hr  y >  \  it>  one-eyed  stereo  is  using  the  model  of  image  for* 
'nation  »o  fihrirntr  a  second  (virtual)  image  of  the  scene.  The 
idea  i'  ih.if  the  model  often  allows  one  to  construct  an  image 
that  I*  iti«l> p* -n« li'ist  of  the  vtual  shape  of  the  imaged  surface. 
Tin-  all.*u>.  the  urinal  image  to  be  depicted  aolely  from  knowl¬ 
edge  -f  *  hr  ni.wlrl  without  mailing  use  of  the  original  image  For 
rvii!‘p!'  Hi*  m  irking*  on  the  surface  of  Figure  2(a)  eould  have 
driv  n  fr 'jo  projiTt mn  of  a  uniform  grid  upon  (he  surface  For 
ill  itn  » '<  -  Hi  »i  lii  t In-  model,  we  can  us*  a  uniform  grid  as  the 
\  irfii.il  mu"r  \*  a  rule,  ihe  onentauon.  position,  and  scale  of 
•  In-  'rnl  will  l>r  unknown,  however,  we  will  *how  how  this  infor* 
mat i- >11  c.iii  l»r  recovered  from  (he  original  image.  Other  models 
give  H-r  in  *.i her  form-  of  virtual  images 

2.1  Determine  Correspondence# 

|4rh»rr  applying  -trrro  technique*  to  calculate  depth**,  we 
tim-f  hr*i  e-t al »li*h  correspondence*  between  points  in  ihe  real 
mi.v'r  and  ihr  virtual  image  When  dealing  with  textures,  the 
pr  >rr>»  »*  typified  by  rnuiiling  t«*xe|>  m  each  image  from  a  rho* 
-e.»  Martin*/  point  With  -haded  images.  the  general  approach 


Figure  i:  (a)  \  projected  texture  (h)  Its  viim.il  image 


is  to  integrate  intensities  N*ver*l  variants  of  the  method  for 
rM  ahlishing  Corre*potidenre*  are  described  in  the  in\i  section. 
Th?  difficulty  of  the  procedure,  it  should  be  n**ted.  sv ill  depend 
on  the  nature  of  the  model. 


2.5  Compute  Depths  Using  Stereo 

With  two  images  and  a  number  of  point -t  *  point  c*-rr«  sp on- 
deuce?*  in  hand,  the  technique*  of  binocular  *'*reo  arr  immedi¬ 
ately  applicable.  At  this  point,  the  problem  has  )>•  «u  reduced  to 
Computing  the  relative  camera  models  klvnm  the  Mid  images 
and  using  that  information  to  compute  depth'  Ia  triangulali«ai. 
The  fact  that  tlie  virtual  image  will  normally  be  an  orthographic 
projection  required  reforimdation  of  existing  al/orit  lints  lor  per¬ 
forming  this  computation.  The  appendix  describes  ;»  new  algo* 
ntlim  that  compute*  the  rrlative  camera  model  and  reconstruct* 
the  3-D  scene  from  right  point  correspondence*  between  a  per¬ 
spective  and  an  orthographic  image 

The  nrohlem  of  recovering  scene  and  imaging  geometry  from 
two  or  more  images  has  been  addressed  by  worker*  not  only  in 
binocular  stereo,  but  also  in  monocular  percept  ion  of  motion  in 
which  the  two  projection*  are  separated  tit  time  »s  well  as  space. 
Various  approaches  have  been  employed  to  derive  equations  for 
the  3-D  coordinate*  and  motion  parameters;  these  equations  are 
generally  solved  !»y  iterative  techniques  [5]  [i<|  jl3j  |!  I|.  ‘  Oman 
[2 1 1  presents  a  solution  for  recovering  3-D  shape  from  three  or¬ 
thographic  projections  with  established  correspondences  among 
at  least  four  poi.it*.  Ilis  ’polar  equation”  allows  compulation 
of  shape  wlien  ihe  motion  of  the  scene  is  re*tcicled  to  rotation 
about  »he  vertical  axis  with  arbitrary  lran*Utmn.  Nagel  and 
Neumann  f 1 0]  have  devised  a  compact  system  of  three  nonlin¬ 
ear  equations  for  the  unrest  r*c  I  rd  problem  when  live  point  cor¬ 
respondences  lietwern  lie  two  perspective  linages  arc  known. 
More  recently.  Huang  [20]  and  l^otiguel-Higgiti*  [*.»]  have  inde- 
pendemly  derived  methods  requiring  on‘y  that  a  *et  of  eight 
simult ancon*  linear  equations  be  solved  when  ei.dit  point  corre¬ 
spondence*  between  two  |>erspective  images  an*  kn  *w»».  In  our 
formulation  wr  are  fared  with  a  stereo  problem  ui'obing  a  per¬ 
spective  and  an  orthographic  image,  while  the  aforementioned 
reference*  arc  indeed  germane,  none  provides  a  -iiiutioii  *«.  this 
particular  problem 

The  dentation  described  in  the  appendix  wits  inspired  by  the 
formulation  of  bouguet- Higgins  for  perspective  image*.  When 
ei  her  image  near*  orthography.  I.ongvM- Higgins  *  method  be¬ 
come*  linkable,  it  is  undefined  if  either  image  i*  truly  ortho¬ 
graphic  Moreover,  his  approach  requires  knowledge  of  the  focal 
length  and  principal  point  in  each  image  while  our  method  was 
derived  *pcrifirally  for  one  orthographic  and  one  perspective  im¬ 
age  whose  internal  imaging  parameter*  may  not  be  fully  known. 


Fispifr  3:  (a)  The  original  ima^r  (b)  Th*  'astuai  image 


3  Variations  on  the  Theme 

In  till'  siriirn  »r  illustrate  how  our  approach  ia  used  with 
several  models  of  texture,  shading.  and  contour.  Where  theie 
models  do  not  match  Riven  scene  characteristics,  they  may  re- 
ijuire  additional  modification.  However,  a  qualitatively  correct 
answer  mtitlii  still  he  obtainable  by  applying  one  of  the  specific 
models  we  discuss  below  to  a  situation  that  appears  to  be  inap- 
propriale.  or  to  an  image  in  which  the  validity  of  the  assumptions 
cannot  hr  •stahlishrd 


3.1  Shap«  from  Textur* 

Surface  shapes  are  often  communicated  to  humans  graphi¬ 
cally  by  drawings  like  Figure  2(a).  Serb  illustrations  can  also 
hr  inlerprelrd  hy  one-eyed  stereo.  In  this  caae,  there  is  BO  need 
to  partition  the  i.naKc.  the  underlying  model  of  the  entire  scene 
musics  of  the  intersect  >ons  of  lines  distributed  ia  the  form  of  a 
square  grid  \t  hen  viewed  directly  From  above  at  sc  infinite  dis¬ 
tance.  the  surface  would  appear  as  shown  in  the  virtual  image  of 
Figure  2(h)  regardless  of  the  shape  of  the  surf  see.  This  virtual 
image  ran  be  construed  is  an  orthographic  projection  of  the 
object  surface  from  a  particular,  but  unknown,  viewing  direc¬ 
tion.  <  orrrspondcnccs  between  the  original  and  virtual  inages 
are  easily  established  if  thrre  are  no  occlusions  in  tht  original 
image  Sclcit  any  intersection  in  the  original  image  to  be  the 
reference  point  and  pair  it  with  any  intersection  in  the  virtual 
image.  A  second  corresponding  pair  can  be  found  by  mowing 
to  an  adjacent  intersection  in  both  images.  Additional  pain  are 
found  in  thr  same  manner,  being  ca.  eful  to  correlate  the  motions 
in  each  image  consistently  in  both  directions.  When  occlusions 
are  present,  it  may  still  be  possible  to  obtain  eomapooaeneas 
for  all  visible  junctions  by  following  a  nonoecluded  path  around 
the  occlusion  (such  as  thr  hill  in  the  foreground  of  Figure  2(a)). 
If  ro  such  path  can  be  found,  the  shape  of  each  isolated  region 
can  still  be  computed,  but  there  will  be  no  way  to  relate  the 
distances  without  further  information.  Other  techniques  used 
to  represent  images  of  3-D  shapes  graphically  may  requirs  other 
virtual  images.  Fig  ire  3(a),  for  example,  would  imply  a  virtual 
image  as  shown  in  Figure  3(b).  Methods  for  recognizing  which 
model  to  apply  ace  needed,  but  are  not  discussed  here. 

Once  correspondences  have  been  determined,  we  can  use  the 
algorithm  given  in  the  appendix  to  recover  depth.  We  have 
presumably  one  perspective  image  and  one  orthographic  image 
whose  scale  and  origin  are  still  unknown.  The  depths  to  be 
recovered  wi.'l  be  scaled  according  to  the  scale  chosen  for  the 


Figure  4:  The  streets  in  this  scene  resemble  a  projected  texture. 
13) 


virtual  image1.  The  choke  of  origin  for  the  orthographic  image 
u  arbitrary,  and  will  lead  to  the  same  solution  regardless  of  the 
point  chosen.  The  appendix  shows  bow  to  compute  both  the 
orientation  and  the  displacement  of  the  orthographic  coordinate 
system,  relative  to  the  perspective  imaging  system.  3-D  coordi- 
aatea  of  each  matched  point  are  then  eeeily  computed  by  means 
of  bock-projeetioo.  A  unique  solution  will  be  obtained  whenever 
the  piercing  point  or  focal  length  of  the  perspective  image  is 
known.  A  minimum  of  eight  pain  of  matched  points  is  required 
to  obtain  a  solution;  depths  can  be  computed  for  ell  matched 
points. 

There  exists  a  growing  literature  on  methods  to  recover  shape 
horn  natural  textures  [7]|12|[18][22|.  We  will  now  show  bow 
the  constraints  imposed  *-y  one  type  of  natural  texture  can  be 
exploited  to  obtain  r-milat  .exult;  by  using  one-eyed  stereo. 

Consider  the  pattern  of  streets  in  Figure  t.  If  this  city  were 
viewed  from  an  airplane  directly  overhead  at  high  cltitude,  the 
streets  would  form  a  regular  grid  not  unlike  the  one  used  as  the 
virtual  image  in  Figure  2.  There  are  many  other  scene  attributes 
that  satisfy  this  same  model.  The  houses  in  Figure  5  would 
appear  to  be  distributed  in  a  uniform  grid  if  viewed  from  directly 
overhead.  In  an  apple  orchard  growing  on  a  hillside,  the  trees 
would  be  planted  in  rows  that  are  evenly  spaced  when  measured 
borisootally;  the  vineyard  ia  Figure  6  exhibits  this  property. 

Ignoring  the  nontrivial  tasks  of  partitioning  these  images  into 
ieotexturnl  regions,  verifying  that  they  satisfy  the  model,  and 
identifying  individual  texels,  it  can  be  seen  how  these  images 
can  be  interpreted  with  the  tame  techniques  aa  were  described 
ip  the  previous  section.  The  virtual  image  in  each  ease  will  be 
a  rectangular  grid  that  can  be  considered  as  an  orthographic 
view  from  an  unknown  orientation.  Correspondences  can  be 
established  by  counting  street  intersections,  rooftops,  or  grape 
vines.  As  before,  one  can  solve  for  the  relative  camera  model 
and  compute  depths  of  matched  points.  Obviously,  for  the  situ¬ 
ations  discussed  here,  we  must  be  satisfied  with  a  qualitatively 

'RtcsJI  that  the  on  final  image  dose  not  contain  the  information  necessary 

to  recover  the  absolute  lise  of  the  ecenc. 


365 


Figure  6:  These  grapevines  exhibit  a  regular  texture.  [3| 


correct  interpretation— not  only  because  of  the  diffievlty  of  lo¬ 
cating  individual  texels  reliably  and  accurately,  but  also  in  view 
of  the  numerical  instabilities  arising  from  the  underlying  nonlin¬ 
ear  transformation. 

3.2  Shape  from  Shading 

For  our  purposes,  surface  shading  can  be  considered  the  lim¬ 
iting  case  of  a  locally  uniform  texture  distribution  (as  the  let- 
els  approach  infinitesimal  dimension!).  To  compute  correspon¬ 
dences,  we  need  to  integrate  image  intensities  appropriately  in 
place  of  counting  lines,  since  the  Image  intensities  can  be  asen  to 
be  related  to  the  density  o*  V-es  projected  on  the  surface.  The 
feasibility  of  this  procedure  d,  -.da  on  the  reflectance  function 
of  the  surface. 

What  types  of  material  posses*  the  special  property  that  al¬ 
lows  their  images  to  be  treated  like  the  limiting  case  of  the  pro¬ 
jected  textures  of  the  previous  section?  The  integral  of  intensity 
in  an  image  region  has  to  be  proportional  to  the  number  of  texels 
that  would  be  projected  in  that  region,  if  the  angle*  i  and  e  are 
defined  as  depicted  in  Figure  7,  it  can  be  teen  that  the  number 
of  texels  projected  onto  e  surface  patch  will  be  proportional  to 
cost;  the  cosine  of  the  iccideni  angle.  At  the  same  time,  the 
surface  patch  (as  seen  from  the  viewpoint)  will  be  foreshortened 
by  cose,  the  cosine  of  the  emittanee  angle.  Thus,  the  integral 


LOCAL 

SURFACE 


Figure  7:  The  geometry  of  surface  illumination 


of  reflected  light  intensity  over  a  region  will  be  proportional  to 
the  flux  of  ihe  light  striking  the  surface  if  the  intensi'y  of  the 
reflected  light  at  any  point  is  proportional  to  cos  i/  cos  t.  Horn 
(6)  has  pointed  out  that ,  when  viewed  from  great  d  stances,  thj 
material  in  the  marie  of  the  muon  and  other  rocky,  du>.y  objects 
exhibit  a  reflectance  function  that  allows  recovery  of  the  ratio 
cosi/eose  font  the  imaged  intensities.  This  surface  property 
has  made  possible  unusually  simple  algorithms  for  computing 
th*pe-from-shading,  so  it  is  not  surprising  that  it  submits  easily 
t«  one-ey  i  stereo  ts  will. 

To  interpret  tbit  type  of  shading,  we  can  construct  a  virtual 
image  whose  direction  of  view  is  the  lighting  directior  taken 
from  a  Virtual  camera*  located  at  the  light  source).  When  the 
original  shaded  image  is  orthographic,  we  consider  s  family  of 
parallel  lines  in  which  each  line  lies  in  a  plane  that  includes 
both  the  light  source  and  the  (distant)  viewpoint.  When  viewed 
from  the  light  source,  the  image  of  the  surface  corresponding  to 
thane  lines  will  aino  be  a  set  of  parallel  lines  regardless  of  the 
shape  of  the  surface.  These  parallel  lines  constitute  the  virtual 
image-  We  m,I  use  the  image  intensities  to  refine  these  l:ne-to- 
line  correspondences  to  point-to-point  correspondences.  Figure 
S  shows  tH  geometry  *or  an  individual  line  in  the  family.  A 
little  trigonometry  shows  that 

As'-^A*  (1) 

cos* 


where  A*  is  a  distance  along  the  line  in  the  real  image  and  A*' 
is  tha  corresponding  distance  along  the  corresponding  line  in  the 
virtual  iesage.  Integrating  this  eviction  produces  the  following 
expression,  which  defines  the  point  correspondences  in  the  two 
i.  sags*  along  the  given  line. 


.  .  /•cost. 

»'-*:+  /  — ds 

/t  (Ml 


To  use  this  equation  we  must  first  compute  from  the 
intensity  value  at  each  point  along  the  line.  This  will,  of  course, 
be  possible  only  when  the  reflectance  function  is  constant  for 
constant  Next  we  choose  a  starting  point  in  the  shaded 
image  and  begin  integrating  intcusities  according  to  Equation 
(2).  For  any  value  of  a,  the  corresponding  virtual  image  point  is 
slong  a  straight  line  at  a  distance  s'  from  the  virtual  reference 
point.  With  these  point-to-point  correspondences  hand,  it 


366 


Figure  8:  Thr  geometry  along  a  Imp  in  the  direction  of  the  light 
sourer 


is  a  simple  mailer  of  trisngulation  to  find  the  3-D  coordinate* 
of  the  surface  points,  given  that  we  know  the  direction  to  the 
light  source.  We  can  explore  the  remainder  of  the  surface  by 
repealing  the  process  for  each  of  the  successive  parallel  lines  in 
the  image,  Adjacent  profiles  still  remain  unrelated  to  each  other, 
since  their  individual  scale  factors  have  not  yet  been  ascertained. 
Knowledge  of  the  actual  depth  of  one  point  along  each  profile 
provides  thr  necessary  additional  information  to  complete  the 
rrronstntrtion.  It  is  important  to  note  that  our  assumptions 
arm  initial  conditions  are  those  used  by  Horn:  the  fact  that  he 
war  >hle  to  obtain  a  solution  under  these  conditions  assured 
thr  existence  of  s  suitable  virtual  image  for  the  one-eyed  stereo 
paradigm. 

For  shaded  prrsprrlivr  images,  we  must  integrate  along  afam- 
ily  of  straight  lines  that  radiate  from  the  point  in  the  image  that 
corresponds  to  thr  location  of  the  light  source.  This  ensures  that 
the  image  line  will  be  in  a  plane  containing  both  the  viewer  and 
the  ligh'  i.i;i-r.  and  that  the  virtual  image  of  each  line  will 
a ‘o  he  a  straight  line.  The  integration  becomes  a  bit  more 
complex  than  shown  in  Kquation  2  because  the  noniinea*  effect* 
of  perspective  imaging  must  be  aerommodated.  Neverthelna, 
it  remains  possible  to  establish  point-to-point  correspondence* 
between  images  and  t.i  reconstruct  the  surface  along  each  line. 


3.3  Shape  from  Contour 

It  ■'  sometimes  possible  to  extract  a  line  drawing,  auch  is 
the  one  shown  in  Figure  0.  from  scene  textures.  Parallel  ttreel* 
like  those  enri-.interr.l  in  Figure  1  give  rise  to  a  virtual  image 
consisting  of  parallel  linrs  wnen  the  rrosi  streets  cannot  be  lo¬ 
cated:  terraced  hills  also  produce  a  virtual  image  of  parallel 
lines.  < 'oerrspondenres  between  real  and  virtual  mage  lines  can 
be  found  by  counting  adjacent  lines  from  an  arbitrary  starting 
point  This  matches  a  virtual  image  line  with  each  point  in  the 
real  image.  I'oinl-to-linr  correspondences  are  not  sufficient  to 
enable  tlie  siereo  computation  of  the  appendix  to  be  used  for 
reconstruction  of  the  surface.  Knc  vlrdge  of  the  relative  orirn- 


Figure  !»:  (a)  An  image  of  contours  (li)  Its  v  irtual  image 


tatior.  between  the  two  images  (equivalent  to  knowing  the  ori¬ 
entation  of  the  ramera  that  produrrd  the  real  image  relative  to 
the  parallel  lines  in  the  scene)  provides  an  adequate  eonsiiaint; 
the  surfarr  rail  then  be  reconstructed  uniquely  through  back- 
projection.  Without  knowledge  of  the  relative  orient  at  ioo  of  thr 
virtual  image,  heuristics  must  be  employed  that  relate  points 
on  adjacent  contours  so  that  a  regular  grid  ran  lie  used  as  the 
virtu \1  image.  The  human  visual  system  is  nnemallv  able  to  in¬ 
terpret  images  'ike  Figure  9  unambiguously  altleui  -|i  ju-t  wlial 
assumptions  are  being  made  remains  unclear.  Furl  her  study  of 
this  phenomenon  may  make  it  possible  to  e  \  I  - . ,  r  |  models  that 
are  especially  suited  to  the  employment  of  one-eyed  stereo  on 
this  type  of  image  without  requiring  prior  know  ledge  of  the  vir¬ 
tual  orientation. 


3.4  Distorted  Texture*  and  Unfriendly  Shading 

We  have  already  noted  that  image  shading  can  lie  viewed 
aa  a  limiting  (and.  for  our  purposes,  a  degenerate)  result  of 
dourly  spaced  texture  elements.  To  i  .cover  depth  from  shading, 
we  must  use  integration  instrid  of  the  process  of  rooming  the 
texture  element*  .hat  define  the  locations  of  the  ‘grid  lines" 
of  our  vrtuul  image.  The  integration  process  depends  on  the 
existence  of  a  "friendly*  reflectance  function  and  an  imaging 
geometry  that  allows  us  to  convert  distance  along  a  line  in  thr 
actual  image  to  a  corresponding  distance  along  a  line  in  thr 
virtual  image. 

Tl.e  recovery  of  lunar  topography  from  a  single  shaded  im¬ 
age  [6],  a a  disc  Hired  in  Section  3.2,  is  one  of  the  few  instances 
in  which  "shape  from  shading*  is  known  to  be  possible  without 
a  significant  amount  of  additional  knowledge  about  thr  scene. 
Nevertlielesa.  even  I  tree  we  are  required  to  know  the  actual  re¬ 
flectance  function,  live  location  of  the  (point]  source  of  illumina¬ 
tion.  and  the  drptlis  along  a  curve  on  the  object  surface,  and  be 
dealing  with  a  |>ortion  of  the  surface  that  has  roii-lanl  albedo 
Furtliermorr,  live  reflectance  function  ha*  to  have  just  the  prop¬ 
erty  we  rrquirr  to  replace  direct  counting,  l.r..  the  reflect anrr 
function  haa  to  compensate  exactly  for  tile  ‘foreshortening'  of 
distance  due  to  viewing  points  on  the  object  surface  from  any 
angle.  Mont  of  the  commonly  encountered  reflectance  functions, 
such  aa  l.auihrnian  reflectance,  do  not  possess  this  friendly  prop¬ 
erly.  and  it  is  not  clear  to  what  extent  it  is  possible  to  rrrover 
depth  from  shading  in  surh  rases  (r.g.,  see  f'riitland  [12|  and 
Smith  [IS]).  Additional  assumptions  will  probably  lie  nrrrssary 
and  the  qualitative  nature  of  the  recovery  will  be  more  pro¬ 
nounced.  'ust  as  in  thr  ■  »e  in  whirh  a  complex  function  ran 
be  evaluated  by  making  a  meal  linear  approximation  and  iter¬ 
ating  the  resulting  solution,  so  it  may  be  possible  to  deal  with 
unfriendly,  or  even  unknown,  reflectance  functions  by  assuming 


* 


a 

0 


367 


’.v  xv-x  *  .  ra . 


'.W  — .  "is  - js  "*,j*  '?S~m  K  ■‘.-TO 


Figure  10:  Tltri  simple  drawing  has  two  reasonable  interpreta¬ 
tions.  It  is  seen  as  curved  roller-coaster  tracks  if  thr  ‘.ines  are 
assumed  to  lie  the  projection  of  a  rectangular  grid,  or  as  a  vol¬ 
cano  when  the  lines  are  assumed  to  be  the  projection  of  a  circular 
grid. 

that  they  are  friendly  in  the  vicinity  of  some  point,  solving  di¬ 
rectly  for  local  shape  by  using  'ne  algorithm  applicable  to  the 
friendly  ease,  and  then  extending  the  solution  to  adjacent  re¬ 
gions.  We  are  currently  invr  ligating  this  approach. 

The  uniform  reel  angulr.i  gnd  and  the  polar  grid  that  we  used 
as  urinal  images  to  illustrate  our  approach  to  one-eyed  stereo 
are  eth-ctue  in  a  large  number  of  cases  because  thers  are  pro 
< ess<-s  operating  in  the  real  world  that  produce  corresponding 
textures  (>.*..  rridlike  textures  that  appear  to  be  orthographi- 
rally  projected  onto  the  surfaces  of  the  scene).  However,  there 
art-  a!  "  «  cxmrc'  that  produce  ••milar-apps'aring  images,  but  are 
due  to  (liTerent  underlying  processes.  For  example,  a  uniform 
gridlik-  texture  might  have  been  created  on  a  flat  piece  of  terrain 
that  is  uhsroiirnily  subjected  to  geologic  deformation — in  this 
ease  the  \  irt  tal  image  (nr  the  recovery  algorithm)  needed  to  re¬ 
cover  depth  ititi'l  he  different  from  the  projective  case.  We  have 
alrradv  indicated  the  problem  of  choosing  the  appropriate  model 
for  the  virlua1  image  and.  as  noted  above,  image  appearance 
alr.ne  is  probably  insuflirier.t  for  making  this  determination 
some  semantic  knowledge  about  the  scene  is  undoubtedly  es¬ 
sential.  Figure  1 0  shows  an  example  in  which  two  completely 
ihITereiil.  yet  erpiallv  believable,  interpretations  of  scene  struc¬ 
ture  result,  depending  on  whether  we  us*  the  rectangular  grid 
model  or  the  polar  grid  model. 


4  Experimental  Results 

The  *ierro  reconstruction  algorithm  described  in  the  ap¬ 
pend. \  ha*  In  m  programmed  and  successfully  tested  on  both 
real  and  mihIi*  nr  imagery.  (liven  a  sparse  set  of  image  points 
and  their  correspondence  in  a  virtual  image.  a  quaiitative  de* 
script  i"ii  « >f  i li«*  imaged  Mirface  can  be  obtained. 

synthetic  imare*  writ*  created  from  surfaces  painted  with 
ouu| ‘iiicr-priirraied  graphic  texture*.  Figure  11(a)  shows  a  syn¬ 
thetic  image  mm-i  rurt  *d  from  a  section  of  a  digital  terrain 
model  (OTM).  The  intersections  of  every  twentieth  grid  line 
ruii'ldLte  the  *cl  of  .Mi  mi  ape  points  made  available  to  the  one* 
«>ed  stereo  algorithm  Their  correspondences  acre  established 
h>  select  mj*  an  arbu  f.iry  origin  and  count  mg  grid  lines  to  ob¬ 
tain  virtual  linage  coordinates.  When  these  pairs  are  processed 
l»>  the  algorithm  in  (lie  appendix,  a  set  of  3-D  coordinates  is 
obtained  in  either  the  \ ieser-centered  coordinate  space,  or  the 
virtual  image  coordinate  space  (which,  if  correct,  is  aligned  with 
the  original  DTM).  Figure  1 1(b)  was  produced  from  the  rrsult- 


Figu.e  11:  (a)  View  of  part 
constructed  from  (a) 


o 


Figure  12:  (a)  Orthographic 
Figure  4  (!>)  Perspective  view 
era  location) 


iew  of  surface  reconstructed  front 
>f  same  surface  (from  derived  earn¬ 


ing  3  1)  coordinates  expressed  in  the  virtual  image  space  by  using 
Smith  s  surface  interpolation  algorithm  [lb]  to  lit  a  surface  to 
these  points.  This  yields  a  dense  set  of  3-L'  coordinates  that  can 
then  be  displayed  from  any  viewpoint.  The  viewpoint  that  was 
computed  by  one-eyed  stereo  was  used  to  render  the  surface  as 
shown  in  Figure  11(b).  Its  similarity  to  the  original  rendering 
(Fig.  11(a))  confirm*  t he  sue c*s»ful  reconstruction  of  the  scene. 

The  same  procedure  was  followed  when  we  worked  with  cal 
photographs.  The  intersections  of  31  street  ini er*ect ions  v*ere 
extracted  manually  fro»i;  the  photograph  of  San  I  ranrbro  shown 
•«i  Figure  I.  IIiiis'  that  were  occluded  or  indi-iimt  were  di*re- 
garried.  Virtual  image  coordinates  were  obtained  b\  coiiuliiig 
city  block*  from  the  lowrrdefl  intersection.  The  niie-rved  stereo 
algorithm  was  then  used  to  acquire  3-D  coordinate*  of  tin*  corre¬ 
sponding  image  j>oii.ts  in  both  viewer-centered  and  grid-c entered 
coordinate  system*.  continuous  surface  was  fined  to  both  re|>- 
rrsenlations  of  these  points.  The  location  and  orientation  of  the 
camera  relative  to  the  grid  were  also  rompnied.  Fr-.ore  12(a) 
shows  (he  rrroiisi  ructvd  •urfare  ;»>  m»  ortliograi'hie  view  from 
tlie  direction  roiuputrd  to  be  true  vertical.  The  uunibn  *uper- 
i  III  posed  are  the  computed  Wat  inns  'f  the  oir  ic  tl  -I  points. 

Figure  12(b)  show*  the  surface  from  the  derived  I  .»  . . if  the 

viewpoint  of  the  original  photo.  While  several  of  the  original 
points  wrr*-  badly  mislocaled.  thr  general  shape  of  tin*  landfurm 
i»  apparent. 

'There  are  several  reason*  the  algorithm  can  p*o>  ide  only  a 
qttali1  alive  shape  description.  First,  the  problem  itself  can  be 
somewhat  set.  .itive  to  light  perturbations  in  the  estimates  of 
the  piercing  point  or  focal  length.  This  appears  to  lie  inher¬ 
ent  to  thr  problem  of  recovering  shape  from  a  single  image. 
How  humans  can  perceive  shape  monocul-uly  without  appar¬ 
ent  knowledge  of  the  piercing  point  or  semantic  content  of  the 
scene  remains  unresolved.  The  second  factor  precluding  precise, 
quantitative  description  of  shape  is  the  practical  difficulty  of 
acquiring  large  numbers  corresponding  points.  W  hile  the  al- 


t 

i 


i 

\ 


368 


at. 


gorithn.  ran  proceed  with  as  few  as  eight  points,  the  location  of 
the  object  will  be  identified  at  those  eight  points  only.  If  a  more 
complete  model  is  sought,  additional  points  will  be  required  to 
constrain  the  subsequent  surface  intr. potation. 

The  task  remains  to  evaluate  the  effectiveness  of  tLe  iterative 
technique,  described  in  Section  3.4,  for  recovering  (l)  shape  from 
shading  in  the  case  of  scenes  possessing  ‘unfriendly’  reflectance 
functions,  a-,d  (2)  shape  from  nonprojective  and  distorted  tex¬ 
tures  Our  expeiienee  with  the  process  indicates  that  the  key 
to  surmounting  these  problems  liee  in  the  ability  to  establish 
valid  correspondences  with  ine  virtual  image.  With  these  in 
hand,  reconstruction  of  the  surface  can  proceed  as  outlined  in 
the  foregoing  discussion. 

5  Conclusion 

In  this  paper  we  have  shown  that,  in  principle,  it  is  possible 
to  employ  the  stereo  paradigm  in  place  of  various  approaches 
proposed  for  modrlmg  3-D  scene  geometry— including  the  case 
in  winch  only  one  ini  --r  is  available.  We  have  further  shown 
tliai,  for  the  rase  of  a  single  image,  the  approach  could  be  im¬ 
plemented  by 

( 1 )  Setting  up  correspondences  between  portions  of  the  image 
and  some  variants  of  a  uniform  grid,  and; 

(2)  Treating  each  image  region  and  its  grid  counterpart  as  a 
stereo  pair,  and  employing  a  stereo  technique  to  recover  depth. 
(We  present  a  new  algorithm  that  makes  it  possible  to  accom¬ 
plish  this  step.) 

Devising  automatic  procedures  to  part’tion  the  image,  select 
the  appropriate  form  r<  the  virtual  image,  and  establish  tbecot 
respondents  are  all  difficult  tasks  that  wer  1  not  addmaed  in 
this  paper.  Nexerlhrlews,  we  have  unified  a  number  of  apparently 
disiinrl  npproarhrs,  that  individually,  also  have  to  contend  with 
these  same  perxa-ixe  problems  (i.e.,  partitioning,  model  selec¬ 
tion.  and  ntatrli’iie). 

References 

Jl]  llamard.  S.  T  .  and  Kisrhle.-,  M.  A.,  ‘Computational 
stereo, “  < '(imputing  Sttrxrxs.  Vol.  14.  No.  4,  Derembcr 

l>2. 

1 2]  Hratly.  M..  ed..  Artificial  Intelligence  (Special  Volume  on 
Computer  Vision),  Volume  17,  Nos.  1-3.  August  1981. 

j".’  <  atnerott.  II..  \fmee  Snn  I'mnnsro.  Cameron  and  Com- 
pan...  salt  I  raiirise.t.  1976 

I  Ij  C  antipathy,  S.,  *  Decomposition  of  Transformation  Ma¬ 
trices  for  Uulmi  Vision.'  Intern- Initial  Conference  On 
f.’obtilirs,  Computer  Soeiety).  Atlanta.  Georgia. 

March  13-1...  IWI.  pp.  130  139 

[5]  (iriinery.  I),  It  "Stereo  Camera  Calibration.'  Proceedings 
of  the  lr  Workshop.  November  1979,  pp.  101  107. 

Horn.  11.  K.  P..  "Image  Intens  ty  I  ndrrstanding MIT 
Alt, final  Intelligenee  Memo  33),  August  1 07-1 . 


[7]  Kenaer,  J.  R.,  “Shape  from  Texture,’  Ph.D.  thesis, 
Carnegie-Mellon  University,  CM U-CS-81  - 102,  November 
1980. 

[8]  Lawton,  D.  T.  ‘Constraint- Based  Inference  from  Image 
Motion,"  Proe.  AAAI-80,  pp.  31-34. 

[9]  Longuet- Higgins,  H.  C.,  “A  Computer  Algorithm  for  Re¬ 
constructing  a  Scene  from  Two  Project  ions.’  Nature,  Vol. 
293,  September  1981,  pp.  133-135. 

[10]  Nagel,  H.,  and  Neumann,  B.,  ‘On  3-D  Reconstruction  from 
Two  Perspective  View- ’  Proc.  IEEE  1981. 

[llj  Nitzan,  D.,  Bolles,  R.C.,  it  et.  al.,  ‘Machine  Intelligence 
Research  Applied  to  Indue. Hal  Automation,*  12th  Report 
SRI  Project  2996,  January  1983. 

[12]  Pentland.  A.  P.,  ‘Shading  into  Texture"  Proceedings 
AAAI-84,  August  1984,  pp.  269-273. 

[|3]  Praxdny,  K.,  "Motion  and  Structure  from  Optical  Flow," 
Proc.  IJCAI-79,  pp.  704-704. 

[14]  Roach,  J.  W.,  and  Aggmrvx],  J.  K.,  “Determining  the 
Movement  of  Objects  front  a  Sequence  of  Images,'  ifcXE 
Trari.  on  Pattern  Analysis  and  Machine  Intelligence,  Vol. 
PAMI-2,  No.  6,  November  1980,  pp.  554-502. 

[15]  Smith,  G.  B.,  “The  Relationship  between  Image  Irradiancc 
ar.d  Surface  Orientation,*  Proc.  IEEE  CVPR-83. 

[16]  Smith,  G.  D.,  *A  Fast  Surface  Interpolation  Technique," 
Proceeding*:  DARPA  Image  Understanding  Workshop, 
October  1984,  pp.  2U-2IS. 

[I7J  Stevens,  K.  A.,  “The  Line  of  Curvature  Constraint  and 
the  Interpretation  of  3-D  Shape  from  Parallel  Surface  Con¬ 
tour*,’  AAAI-83,  pp.  1057-1061. 

[18|  Stevens,  !v  A.,  ‘The  Visual  Interpretation  of  Surface  Con¬ 
tours,"  Artificial  Int'-ifigrnce  Journal  Vol.  17.  No.  I,  Au¬ 
gust  1981,  pp.  47  73. 

[19|  Slrat,  T.  M.,  "Ilrroveriiig  the  Camera  Paroitn-tir-  front 
a  Transformation  .Vyxtrix,"  Proceeding-:  DM.’DA  Image 
('iidt-rstaitding  Workshop.  Orloher  1981.  pp  Jl'. I  271. 

[20]  Tsai.  It.  V.  and  lliiang,  T  S.,  "l  iiiqueui —  and  I  .-I  i mat  ion 
of  Three-Dimensional  Motion  Parameter,  of  Ifigid  Objects 
with  Curved  Surfaces."  IKEK  Trans,  on  I'aitem  Vn.ilyi- 
aittl  Marliitie  luteiligriire.  vol.  PAMI-6,  No  I  Ian  pw|, 
pp.  13-27. 

[21 1  (  liman.  S„  Tin ■  lull  rpn hition  of  \  imiiiI  Motion.  The  MIT 
Press.  Cambridge.  Mas-..  1979. 

[22]  W  ilkin,  A.  I\,  "Heroverilig  Surface  Shape  and  t  trieiilatiou 
from  Texture.  Artificial  Intelligence  .Inurtu. I  Vol  17.  No 
I.  August  1981 .  pp.  17  15. 

[23|  Wilkin.  A.,  and  Mass,  M..  "Vnaly/iug  Orienied  P.iti.  ni-.’ 
Proeecding-  l  ie  VI-85 


PRINCIPAL 

RAY 


world  oordinate  svstcm  defined  such  lhai 


'  OBJECT 
POINT 


piercing  j 

IMAGE  _  POINT  f  j  «i  /  _ 

PLANE  7 IMAGE 

V  POINT 

/  IMAGE 
f  /  ORIGIN 


CENTER  OF 
PROJECTION 


Kipire  13:  Definition  of  coordinate  system 

6  Appendix 

The  main  body  of  t  his  paper  was  devoted  to  showing  how  the 
problem  of  interpreting  certain  varieties  of  textured  and  shaded 
linages  ran  be  transformed  into  equivalent  problems  in  binocu¬ 
lar  stereo,  beginning  with  a  perspective  image,  a  second  (vir¬ 
tual)  image  is  hypothesised  according  to  some  presumed  model 
of  the  original  image.  The  model  also  specifies  how  to  estab¬ 
lish  the  correspondence  between  points  in  the  two  images.  To 
compute  the  shape  of  the  surface*  in  the  original  scene,  we  need 
oniv  compute  the  3-D  coordinates  from  the  information  in  the 
two  images,  where  the  actual  serne  is  a  perspective  projection 
and  the  virtual  image  has  been  constructed  as  an  orthographic 
projection.  This  appendix  shows  how  three-dimensional  coor¬ 
dinates  can  he  rotT, puled  from  point  correspondences  between 
a  prrspertive  and  an  orthographic  projection  when  the  relation 
between  the  imaging  geometries  is  unknown. 

We  will  use  lowercase  letters  to  denote  image  coordinates  and 
uppeii  ,_sr  letters  for  3-D  object  coordinates.  Unprimed  coordi¬ 
nates  will  re'er  to  the  geometry  of  the  perspective  image,  and 
printed  coordinates  to  the  orthographic  image.  Let  x,  and  x5 
lie  the  ittiis  crx-rdinates  of  a  point  in  the  perspective  image 
relative  to  an  arbitrarily  selected  origin.  Let  and  -rfj  be 
the  [unknown]  image  coordinates  of  the  principal  point  and  let 
/  [>  Oj  he  I  lie  focal  length  The  object  coordinates  associated 
w  ith  an  image  point  are  (,V|,  A';,  A’j),  where  the  origin  coincides 
with  the  center  of  projection  the  As  axis  is  perpendicular 
to  'lie  image  plane.  The  As  coordinates  of  any  object  point  will 
necessarily  he  positive. 

The  imaging  geometry  is  as  depicted  in  Figure  13  and  yields 
the  following  standard  perspective  equations: 


•O  +d|  =  / 

A  i> 


jj  +  d,  =  /  ■. 


For  the  orthographic  image,  i\  and  I;  are  the  image  coor¬ 
dinates  (relative  to  an  arbitrary  origin)  and  (A’J.  A’J,  A’J)  is  the 


*i  -  'J; 


A  =  M 


We  use  the  unknown  scale  factor  between  orthographic  image 
coordinates  and  the  scene  as  our  unit  of  measurement. 

The  two  world  coordinate  systems  can  he  related  as  follows: 

A'1  =  ft(.Y  -  T)  .  <•"■) 

where  .V  is  the  column  vector  [ A’i ,  A';,  Ajj' . 

A'1  is  the  column  vector  [A J ,  AT,  AJ]r. 
ft  is  a  3x3  rotation  matrix,  and 

T  is  a  translation  vector  from  the  cv..  r  of  perspective  projection 
to  the  origin  of  the  world  coordinate  system  associated  with  the 
orthographic  projection.  For  either  component  (i=  I  or  2),  we 
can  write 

A''  -  ft,  •  (X  -  T)  (6) 

where  ft,  is  the  i-th  row  of  ft.  By  substituting  Equations  3  and 
4  into  the  above,  we  obtain 

v  -  /(*'■  + «■  T) _  (v) 

ft1[(x,+d1),  (x,+./2).  /] 

Eliminating  -Vs  from  the  two  equations  in  Equation  7  yields 

0  =  ijxi  ft;,  +  xjx;  ftyg  +  xj  ft y  -  D 
-x',1,  ft,,  -  x'2x:ft,j  -  lift,  ■  D 
+  X,(ft;,  ft,  •  T  —  ft,  I  ft;  •  T)  +  X;(  ft;;ft,  •  T  —  ft|"ftx  ■  T) 
+  ft,  Tft,  D-  ftj  -rft,  •  D 

(8) 

where  D  is  the  vector  [if,,  dj,  /]. 

The  above  equati.  n  relates  image  coordinates  rir  correspond¬ 
ing  points  in  both  images.  The  following  unknowns  cen  he  found 
by  usinj  eight  corresponding  pairs  and  solving  the  system  of 
eight  linear  equations: 


ft.  -  J&ft.  r-  ft,  r 


ft*  ■  2“  ft.  T~  Inft*  r 

ftt «  sL(ft,  r,(ftj  •  d)  -  «V; (ftt  'ftKft.  ft) 


xjx,  x'.xj  xj  -xj/;  -x',  X| 


When  more  than  eight  points  are  available,  a  least-squares 
method  can  be  employed  to  solve  the  system  of  equations.  Once 
we  have  the  /?,’»  in  hand,  we  can  solve  for  the  components  of 
the  rotai-v.i  matrix  ft.  First,  ft|,  ran  be-  determined  by  making 
use  of  the  fart  that  the  rows  of  a  rotation  matrix  are  orthogonal. 
Thus,  from  ft,  •  ft;  =  0  and  the  expressions  for  U\,  ft;  and  /?, 
in  Equation  9,  we  get 

ft?, ( ft; B]  +  fti-2fl, fl;ft,) - ftf, (1  +  B'  +  lil  +  /};)+ 1  =0  (11) 


370 


i 


If. 

« 

K 


I 


i 


s 


-JS 

I. 


This  yields  two  real  values  for  F  ;  fortunately  we'll  be  able  ‘o 
identify  the  mcorrect  one  later,  /or  now,  let  us  simply  choose 
one  at  random  ana  return  to  this  point  if  it  turns  out  to  be 
wrong. 

The  rest  of  R  can  be  derived  fror.i  the  B,  s  in  a  similar  fashion. 
f?i;,  A; i  ard  /?;;  can  be  established  immediately  from  An  ana 
Equation  J.  A|$  is  determined  from  the  fact  that  ||  R\  j)=  1. 
R i  -  A;  =  0  gives  an  expression  for  A2s.  Finally,  As  is  computed 
fron  -.,e  fact  that  x  /?2  =  R3  for  all  rotation  matrices.  As  a 
result,  we  have  completely  derived  two  alternative  R  matrices, 
depending  oil  the  choice  of  An  ■  One  of  these  matrices  is  correct, 
while  the  olner  can  be  eliminated  later. 

Now  to  solve  for  the  translation  vector  T.  First  let  us  note  that 
T  cannot  be  found  uniquely,  because  the  origin  of  the  primed 
world  coordinate  system  has  not  been  completely  specified.  The 
.Yj  and  \2  coordinates  of  the  origin  w-re  fixed  by  the  choice  of 
origin  for  the  orthographic  image  coordinates,  but  the  position 
of  the  origin  along  the  A’J  axis  is  still  unconstrained  Since  we 
are  free  to  choo-e  any  origin  for  A',  we  will  choose  the  one  for 
which  7j  =  0. 

I’sing  the  expression  for  Bn  in  Equation  9,  we  find 
Rr  =  (  /i’mT,  +  AijT2  +  Atj7a)  —  (A2|T|  +  A22T2  +  A2j7j) 

«ti 

(12) 

Making  use  of  the  fact  that  R33  =  Ai|A22  —  A|2A;i  and  7j  =  0, 
we  get 

An 


Similarly, 


T,  =  -A, 


t,  =  n, 


’Ass 

Am 

Ass 


(13) 


(M) 


The  origin  or  the  primed  coordinate  system  ir  unpriced  ro- 
ordinates  is  given  by 


-"-z- 01 


(lit 


If  the  location  of  tlir  principal  poinf  is  known  Mil  the  fora! 

*t«»r  nf  thr  nnrtinrrl  ivi»  im»rr«)  s  not .  f  Can 


Inigth  (tin*  ^calc  factor  of  the  perspective  ima 
eavilj  l>*»  <*Oiiipute<l  from  Kquation  9: 

fit  R\  i  “  R\\<1\  — 


/  = 


fir-) 


If  the  foe  a  i  length  is  known,  the  principal  print  of  the  prrspec- 
live  imatre  i*  found  as  follows.  l*«c  the  third  and  fifth  expressions 
of  Kqmt;<n  9  to  write  two  equations  in  th**  two  unknowns,  .f, 
and  </•*.  'i  heir  soli'iion  yields 


/in 


(17) 


The  perspective  image  coordinates  cf  the  principd  point  are 

i-d,.  -,y. 

If  neither  the  local  length  nor  principal  point  is  known  be¬ 
forehand.  then  the  problem  we  have  proposed  does  not  have  a 
unique  solut  on.  Equation  17  specifies  the  constraints  betwren 
focal  length  and  piercing  point.  For  any  choice  01  focal  length, 
there  exisis  a  unique  principal  point.  Tile  renter  ..if  perspective 
projection  is  ronstrained  to  lie  on  a  line  para'M  to  the  fines  of 


fight  of  the  orthographic  projection.  The  reconstructed  -iirure 
will  be  distorted  as  oue  varies  the  center  of  projection  along  this 
une.  It  is  worth  noting,  however,  that  our  computatons  of  die 
rotation  matrix  A  and  the  translation  vector  T  did  not  require 
know  ledge  cf  either  the  focal  length  or  the  principal  point. 

We  are  now  in  a  position  to  compute  riie  world  coordinates 
of  all  points  for  which  we  have  correspondences.  Tlu-rc  may,  of 
course,  be  many  more  than  the  minimum  of  eight  points  used 
so  far.  Equation  6  gives 


=  a, 

which  can  be  solved  for 


-yr(*l  +  di),  ~y(iz  +  d2), 


Asl-A.r  (18) 


v  „  M  +  Rt-T) 

*  Anil  4  A|2xj  4  Ai 


D 


(19) 


Now  we  must  check  the  signs  of  the  .Yj's.  If  they  are  negative, 
the  world  points  are  located  behind  the  rente-  of  projection. 
The  correct  solution,  corresponding  to  all  positive  values  of  A’j, 
can  be  found  by  choosing  the  alternative  .alno  of  A i,  derived 
earlier  and  repeating  the  computations  from  that  poir.t  After 
obtaining  the  set  of  por'ive  .Yj’s,  we  can  continue 

Equation  3  gives  the  other  unprimed  world  coordinates: 


V,  =  ^{r ,+«/,); 


■V;  =  V  ( T-  +  <:) 


(20) 


If  desired,  the  primed  coordinates  can  be  found  by  applying 
Equation  ' 

The  above  derivation  makes  the  implicit  assumption  that  the 
A'J  and  -YJ  axes  are  scaled  equally.  It  is  conceivable  that  the  vir¬ 
tual  image  coordinates  could  be  unequally  scaled,  as  is  the  case 
when  they  are  derived  from  a  rectangular  grid  (e  g..  Figure 
If  we  have  prior  knowledge  of  the  ratio  of  the  sides  of  each  rect¬ 
angular  grid  element,  then  the  virtual  image  coordinates  should 
be  normalited  before  applying  this  algorithm  (i.  e..  by  dividing 
A’j  by  this  ratio).  Without  knowledge  of  the  ratio.  '  »■  problem 
•s  underspecified  and  a  unidimensional  class  oi  solutions  exisis 
K.  ow ledge  cf  the  piercing  point,  if  available,  can  be  mod  to  con- 
sl-ain  th*  problem  further  and  to  solve  for  the  unique  volution. 
To  do  this,  we  define  the  following  virtual  coordinate  systems  in 
place  of  Equation  (4): 

,  I 


-YJ; 


\: 


(21) 


where  k  is  the  ratio  of  the  sides  of  the  rectangular  grid  element 
The  solution  proceeds  as  before,  yielding 

0  SB  z\  7,  A.,  4  z\  X*  A;e  4  z\  A;  ■  D 

-kz'.z,  A,,  -  ff,j  -  ks'~R,  ■  I) 

4  r,  ('o’;.  A,  .  T  -’/.’I  |  A;  ■  T)  4  .C;|  A;; 

4  A,  -7’A;  n  -  A;  TR,  I) 


r-  A, -A-  T) 

(22) 


Ti.e  above  equation  is  recast  as  the 


ght  I 


llienr  equal  inri- 


-/  —  X. 


'  C| 

<'* 

/',r, 

1  1 
<  *. 

( 

l  f 


V'. 


r. 

h 


(23) 


371 


'I 

tr*» 


‘  \ 


3 


e 

i 

r. 

I 


* 


re 


C,  -  R‘2 
1  /?;» 
r  -  f’iD 

"  "  «:i 

c.  =  ‘£1 


-3 

C*4 

r .  = 


4V;: 

/:/f  0 

S.i 

w.  -  *" r*'T -  n i  •  r 

c.  =  «f„  («.  r)(fl,  d;  -  *'-(*>  •  r)(B,  •  D) 


(24) 


The  follow  hip  equalities  can  then  he  derived  from  Equation 

(24): 


Bis  =  ^(C,  -  Cjrf,  -  C.</2) 

B2-  =  -<Vs) 

,  _  /~~(42j  -  Carf|  -  6'4rf2)(C'2  -  rfi  -  C,</2) 

1  ~  \  '  ca*c,c. 

\fp  +  Z3  +  (c2  -  rf,  -  Cii/j)2 
4-  =  RJ  JfCl  +  +  (C*  -  C3rf,  -  C.</2)2 


(25) 

(26) 

(27) 

(28) 


The  rest  of  ft  can  now  be  computed  easily  from  Equation  (2i) 
and  H\  x  ft?  =  ft 3 .  The  translation  vector  T  is  given  by  Equation 
do)  because  C9  =  Fs$  and  Cj  =  ft7.  With  ft  and  T  now  fully 
recovered,  i*  is  r.  simple  matter  to  derive  the  object  coordinates 
from  Kqs.  (3)  (21).  and  (5).  Let  us  recall  that  we  have  two 
candidate  matrices  ft  hinging  on  the  choice  for  ft? j;  as  before, 
the  correct  one  must  be  selected  by  examining  the  sigr.s  of  the 
.Vs  coordinates. 


To  summarize,  we  have  described  an  algorithm  to  compute  the 
relative  orientation  and  position  between  two  imaging  systems  — 
perspective  and  orthographic — from  the  locations  cl  eight  (or 
more)  corresponding  image  points.  Either  the  principal  point  or 
the  focal  length  and  rectangular  aspect  ratio  are  computed  along 
the  way.  With  this  information  in  hand,  the  world  coordinates 
of  all  points  in  the  imaged  scene  can  be  computed. 


w-  - 


e> 

0 


i 


t.' j 


372 


STEREO  CORRESPONDENCE: 
FEATURES  AND  CONSTRAINTS 


Hong  Sch  Lira  A  Thomas  O.  Blnford 

Artificial  Intelligence  Laboratory,  Computer  Science  Department 
Stanford  University,  Stanford,  CA  94305,  USA. 


ABSTRACT 

This  is  a  preliminary  report  on  a  high  level  stereo  system  for 
recovering  depth  of  a  three  dimensional  scene  from  a  stereo  pair  of 
images.  Junctions,  edgels,  extended  carves  and  regions  are  features 
used  for  matching.  Geometrical  properties  of  features  and  relations 
among  features  are  used  as  constraints  to  guide  both  local  matching 
and  global  matching.  The  result  is  a  sparse  disparity  map  of  the 
analysed  scene. 

Junctions  are  determined  by  extending  carves  and  using 
intensity  values  from  image.  They  are  classified  and  used  for 
matching.  We  illustrate  how  this  system  can  solve  two  classical 
correspondence  problems.  We  show  results  if  correspondence  for  line 
drawings  and  preliminary  result*  fron.  real  images.  These  results 
indicate  that  (bis  technique  renders  not  only  an  accurate  local  match 
but  also  a  globally  consistent  match.  The  computation  complexity  is 
reduced  by  two  orders  of  magnitude  when  compared  to  tome  existing 
methods. 


INTRODUCTION 

A  high  level  stereo  system  tries  to  match  structures  derived 
from  pair  of  stereo  images.  Structures  are  extracted  from  image  by 
effective  segmentation  and  aggregation.  We  try  to  infer  surfaces  and 
three-dimensional  structures  by  interpreting  structures  from  images 
[Biaford  1981,  Malik  1984|.  We  believe  using  high  level  structures  for 
matching  will  give  a  more  reliable  and  globally  consistent 
correspondence. 

Stereo  malrhice  has  the  advantage  of  being  a  passive  method 
of  ranging.  It  hao  been  app'ied  to  various  domains,  r  g  obstacle 
avoidance  in  navigation  |Moravee  1980|,  aerial  cartography  [Panton 
1378],  automatic  surveillance  [Henderson  :979|  and  modeling  objects 
for  model-based  vision  [Takamura  15*84]. 


In  stereo  matching  we  want  to  identify  corresponding  views  of 
th-  same  element  in  the  physical  scene.  Changes  in  view  point 
change  the  images  of  these  corresponding  elements.  Features  chosen 
for  matching  should  therefore  be  invariant  or  vary  slowly  with 
change  of  view  point.  A  difficulty  is  to  achieve  globally  consistent 
matching.  Local  features  or  ueir  in  one  image  may  match  with 
more  than  one  feature  or  area  in  the  other  image.  These  ambiguities 
in  local  matches  can  be  resolved  by  incorporating  global  strategies. 
The  key  problems  in  stereo  matching  are  therefore  choosing  the 
features  to  be  matched,  determining  constraint  relations,  and 
designing  the  strategies  for  matching  these  features  both  locally  and 
globally. 

First  we  give  a  brief  review  of  previous  work  ia  stereo 
matching,  then  we  discsss  our  approach  to  the  problem.  Our 
technique  is  based  on  high  levs)  constraints  which  require  high  quslity 
input.  The  Nalwa  edge  segmeuter  |Nalwa  85|  provides  extended 
curves  and  junctions.  As  part  of  this  etereo  system,  we  have 
improved  junction  determination  significantly.  We  test  our  technique 
on  some  curve  drawings  and  then  present  some  preliminary  results  of 
applying  this  technique  to  real  images.  An  •  '*  of  the  eomplexitv 

of  computation  is  also  given. 

REVIEW 

Two  principal  techniques  have  bten  used  in  stereo  matching, 
area-based  matching  [Hannah  1974,  Panton  1978|  and  feature-based 
matching  |Baker  1981,  Medioui  1 085] . 

Area-based  matching  tries  to  match  an  area  of  pixels  in  one 
image  to  another  image.  Ideally  on*  would  like  to  locate  a 
correspondence  pixel  for  each  pixel  in  Loth  images.  But  the 
information  in  one  pixe:  is  not  enough  to  resolve  the  ambiguity  in 
matching.  As  a  result,  a  small  window  is  chosen  as  the  matching 
unit.  A  window  in  on*  image  is  matched  with  a  rant's  of  windows  in 
the  other  image  using  cross-correlation  or  similar  measure  of  the 
similarity  between  two  windows. 


i 


r 

i 


i 

i 


Area-bated  matching  >w  been  applied  with  partial  success  in 
terrain  mapping.  However,  it  requires  the  presence  of  detectable 
texture  for  matching,  it  ead*  o  break  down  when  the  scenes  have 
textureless  areaj  repetih.e  texture,  or  r-rface  discontinuities.  At 
surface  discontinuities  tbe-e  are  occlusions  and  no  possible 
correspondence  between  areas  crossing  the  occlusion.  The  accuracy 
for  correspondence  in  area-based  matching  depends  on  the  window 
size  and  k  generally  an  order  of  magnitude  less  than  that  of  feature 
based  matching.  Computation  time  can  be  reduced  by  matching  only 
areas  that  ure  of  prrticular  interest,  e.g.  with  large  variance. 

Features  such  as  junctions  or  edges  are  extracted  from  the 
intensity  image  for  matching.  The  underlying  principle  is  what  a 
discontinuity  in  the  intensity  represents  a  discontinuity  on  the 
physical  surfaces  ir.  the  scene.  The  discontinuity  can  be  due  to 
suprace  depth,  orientation,  reflectance,  or  illumination.  All  these 
curve  discon tinui^.cs  occur  points  on  the  physical  surfaces.  By 
matching  tuese  features  we  match  physical  curves  on  object  surfaces, 
except  at  limbs. 

The  location  of  carve  features  in  an  image  can  be  estimated  to 
sub-pixel  accuracy  [MacVicar- Whelan  1981).  The  accuracy  of  the 
recovered  three  dimensional  depth  is  higher  than  that  of  area-based 
matching.  Note  that  the  number  of  features  is  in  general  tess  than 
.he  number  of  pixeis.  As  a  icfuli,  the  computation  time  is  less. 
Since  not  every  point  in  v*  image  corresponds  to  a  feature,  feature 
match,  ig  leads  only  to  a  ip?rse  dep‘h  map.  To  produce  a  dense 
depth  map,  it  must  be  accompan'ed  by  model-based  interpretation, 
surface  interpolation,  or  by  area  matching. 

The  search  space  for  matching  features  can  be  greatly  rtJuced 
by  knowing  the  geometric  relationships  between  the  cameras  used  in 

taking  the  images.  Family  of  planes  passing  through  the  object  and 

o 

the  two  camera  foci  are  called  epipolar  planes.  Epipolar  lines  are  the 
intersection  of  the  epipolar  pla»?s  with  'he  image  planes.  Any  image 
point  lying  on  a  particular  epipolar  line  will  find  its  corresponding 
points  on  the  corresponding  epipolar  line  in  the  other  image. 

In  particular,  if  we  restrirt  the  camera  geometry  to  be  such 
that  the  principal  horizon  curves  of  both  images  are  cnllinear  and  the 
pnneipa)  vertical  curves  of  both  images  are  parallel,  that  is  the  two 
cameras  are  related  by  a  horizoatal  displacement,  then  the  epipolar 
lines  are  just  the  horizontal  curves  i  a*  image.  The  search  for 
correspondence  points  wj1!  be  limited  to  the  same  horizontal  curve. 
Locations  of  features  can  be  transformed  into  the  canonical  stereo 
system  [Gennery  1979|. 


A  point  in 

iaags  planes 

Figure  1 


Perkins  (Perkins  j 9701  pointed  out  the  difficulties  involved  in 
trying  'o  resolve  the  matching  problem  without  resource  to  higher 
level  information.  A  hierarchical  scheme  for  matching  stereo  images 
of  polybedra  was  implemented  by  Ganapathy  |Gac>xpathy  1976).  He 
also  studied  various  rules  for  stereo  matching.  A  cooperative 
computation  algorithm  was  proposed  by  Man,  Poggio  and  Grim  son 
|Marr  1976,  1977,  Grimson  1979]  which  matches  random  dot 
stereograms.  It  uses  uniqueness  constraint  to  assign  at  most  one 
disparity  value  to  each  point  in  the  image  and  continuity  constraint 
to  require  tie  disparity  to  vary  smoothly,  except  at  depth 
discontinuities.  Arnold  and  Binford  (Arnold  1978)  used  edge 
orientation,  intensity  and  edge  continuity  to  determine  a  set  of 
globally  optimal  matches.  They  [Arnold  80]  aiso  introduced  quasi- 
invariants  for  correspondence  of  edger  and  surface  normals.  Baker 
and  Binfc'd  |Baker  81j  match  edp*v  on  « oi polar  lines  using  those 
qu'tsi-invariintB  and  uses  dyp.nmic  programming  to  preserve  the  order 
of  edges.  A  connectivity  constraint  ’-as  used  to  removed  globally 
inconsistent  edge  correspondences.  Obt*  and  KanadejOhta  1983| 
extended  Baker’s  method  from  intra-scaalino  search  to  inter-sc anline 
search  which  take  into  account  of  the  mutual  dependency  between 
epipolar  lines  it  an  image.  Another  system  which  uses  segments, 
grouos  of  collinear  connected  edge  points,  as  matching  primitive,  ran 
implemented  by  Medioni  and  Nevatia  |Medioni  1985].  Its 
correspondence  is  based  on  a  minimum  differential  disparity  criterion. 

OUR  METHOD 

We  have  chosen  junctions,  edgeb,  curve-,  and  regions  to  t>«  our 
matching  leatures.  Junction  are  end  points  of  curves  and  Che 
intersections  of  curves.  The  degree  of  a  junction  is  the  number  of 
curves  forming  the  junction.  Junctions  of  degree  three  or  less  can  be 
classified  as  I  (junction  of  degree  one),  L,  Y,  A,  and  T.  Curves  are  a 
connected  group  of  edgels;  each  curve  has  two  junctions.  The  very 
fact  that  these  features  are  chosen  is  because  they  are  quasi-invariant 
to  view  point.  That  is  with  a  change  of  view  point,  these  features 


O 


374 


remain  unchanged  in  the  images  except  under  acciJ.utal  alignment. 
Abo  all  these  features  correspond  to  the  physics'  points  in  the  »:eue, 
with  the  exception  of  a  T  junction  wuicb  is  normally  formed  by 
occlusion.  Another  exception  is  the  limb  of  any  carved  object  which 
forms  a  curve  in  the  image  but  'be  location  of  this  curve  op  the 
object  changes  with  view  points.  Therefore,  we  would  like  to 
distinguish  limbs  from  real  edges  and  T  junctions  from  other  types  of 
junctions. 

Our  me'.ood  matches  not  only  junctions,  edgeb,  curves,  and 
regions  bnt  also  relations  between  them.  Note  that  tue  junctions  of  a 
mat-led  pair  of  curves  must  match  and  the  curves  between  two  prirs 
•a  matched  junctions  must  also  match. 

Our  technique  determines  all  possible  matches  01  junctions 
lying  on  one  epipolar  line  (or  within  an  equivalence  class  of  epipolar 
lines  depending  on  measurement  error).  We  first  assert  that  the  type 
and  degree  of  the  chosen  junctions  are  matched  Though  the  type 
and  degne  of  the  junctions  are  not  view  point  invariant,  this  is  a 
good  way  to  prune  off  unwanted  match.  As  for  those  matches  thrt 
are  wrongly  pruned  away,  we  will  come  back  later  and  match  them 
separately.  We  want  to  point  out  that  in  fact  both  the  type  and 
degree  of  the  junctions  can  vary  widely  under  wide  angle  stereo. 

In  order  to  assert  a  particular  match  for  a  junction,  all  the 
junctions  which  are  connected  to  this  junction  by  curves  most  be 
matched  unless  occluded.  The  contrast  and  intensity  across  curvet 
joining  these  junctions  must  also  be  consistent.  By  so  deist,  we 
implicitly  enforce  a  constraint  for  matching  globally.  Uniqueness  in 
matching  it  enforced  in  our  method.  Each  point  in  an  image  may 
only  be  matched  to  at  most  one  point  in  the  other  image.  In  effect, 
we  do  not  allow  transparent  objects  in  our  images.  This  system  now 
deab  with  occlusion  indicated  by  T  junctions  and  limbs  but  not  those 
occlusions  where  surfaces  disappear. 

After  the  initial  matching,  we  come  back  to  those  curves  that 
are  still  unmatched.  For  those  curves  that  have  only  one  end  point 
matched,  we  start  matching  from  the  matched  junction  and  trace  the 
curve  trying  to  match  every  edgel  of  the  curve  to  the  corresponding 
curve  in  the  other  image.  Again  we  use  the  grey  scale  intensity  and 
contrast  across  the  edge,  direction  along  the  edge  and  the  epipolar 
constraint  to  confirm  the  match. 

In  essence,  when  we  match  junctions  with  the  global  constraint, 
we  actually  nail  down  some  points  in  the  image.  Using  these 
matched  points  as  reference,  we  try  to  match  all  points  between. 

ON  LINE  DRAWINGS 


»ve  will  demonstrate  the  effectiveness  of  the  above  tecbniqi.* 
using  some  curve  drawings.  Part  of  the  technique  can  be 
implemented  as  an  integer  programming  problem,  or  using  other 
constraint  systems. 


For  every  junction  i  on  the  left  image  A  we  try  to  find  a 
junction  j  ia  the  right  image  B  along  the  same  epipolar  line.  For 
each  pair  we  created  a  variable  ABij.  Later  we  will  assign  ABij  to  be 
1  if  it  is  a  match,  0  if  it  is  lot. 


1  2 
linn  1  • - • 

/  \  \ 

11ns  3  /  \  \ 

/  \  \ 

11ns  6  3*  4» - »6 

I  I  I 

11ns  7|  II 

I  I  I 

11ns  #  s - • - s 

6  7  8 


1  3 


/  \ 

/  \  \ 

/  \  \ 

3»  4> . »6 

I  I  I 

I  I  I 

I  I  I 


6  7  8 


Iasgs  A 


Iaags  B 

Figurs  3 


For  example,  from  the  first  epipolar  line  we  create  variables 
AB11,  AB12,  AB21  and  AB22  (or  all  the  possible  matches  of  the 
junctions  1  and  2  ia  image  A  to  junctions  1  and  2  in  image  B.  For 
each  junction  we  want  to  have  all  junctions  connecting  to  it  to  be 
matched  also.  That  is,  to  match  junction  1  in  image  A  to  junction  1 
in  image  B,  we  require  'bat  junctions  3,  4  and  2  in  the  left  image 
match  those  ia  the  right  image.  It  is  represented  by: 

3abll  -  ab22  -  ab33  -  abM  <—  0. 

This  say*  that  the  order  of  jnnetion  1  is  three  and  we  need  all 
the  connected  junctions,  namely  2.  3  and  4,  to  match  in  order  for 
junction  1  to  be  matched.  In  the  meantime,  we  are  also  implicitly 
matching  the  order  of  'he  junctions.  Finally  the  condition  that  the 
matching  is  unique  is  added.  For  example: 

abll  +  abl2  —  1 

witch  implies  that  junction  1  in  the  left  image  caa  only  be 
matched  to  junction  1  or  2  in  the  right  image  but  not  both  In  this 
example  we  did  not  add  ia  the  constraints  that  the  contrasts  across 
the  curves  have  to  be  matched.  Wr  want  to  maximize  the  evaluation 
function  ■  bich  is  the  total  number  of  matched  junctions.  The  results 
indicate  that  all  the  junctions  are  matched  correctly  (see  aopendix  for 
the  details  of  the  program).  Consequently,  the  curves  between 
junctions  can  be  matched  ia  no  time. 


« 


b-J* 


‘V.t 


375 


Tbe  effectiveness  of  thn  technique  iu  nine  he  demonstrated  by 
the  fact  that  it  can  solve  two  of  the  classical  problems  encountered  in 
stereo  matching. 


Consider  the  case  of  .natching  images  of  a  checker  board  or 
any  other  image  with  a  repetitive  pattern.  All  the  junction:,  in  the 
inner  part  of  the  image  are  of  the  same  type  and  degree.  Any 
matching  strategy  based  on  local  constraints  is  bound  to  come  up 
with  ambiguous  matches. 


1  2  3  4  1  2  3  4 

•  - * - • - •  e - • - • - $ 

III  III 

SI  Si  71  B  61  61  71  e 

« - • - « — - •  a-—  -a— — s — — s 

III  III 

III  III 

•  - s-- — • - •  • - « - • - • 

9  10  11  12  9  10  11  12 

Inags  A  I sags  3 

Figure  3 


Junctions  A6,  A7  in  image  A  and  junctions  B6,  B7  in  image  B 
have  the  same  type  and  degree.  The  grey  scale  rnd  contrast  nearby 
can  also  be  the  :»«  a.  They  cannot  be  distinctively  matched  with 
only  local  information  With  gioua!  constraints,  we  note  that  the 
order  of  AS  connecting  to  A6  is  different  from  the  order  of  B8 
connecting  to  B7.  Consequent!; ,  AS  canno.  be  matched  to  B7.  The 
same  reasoning  bolds  for  A7  and  B9.  Therefore  the  ambiguity  so 
longir  exists. 


Tbe  muove  prot'em  car  sometimes  be  solved  by  matching  the 
order  of  features  along  an  epipolar  line.  Though  it  is  n  good  strategy 
and  it  holds  for  most  of  the  time,  order  reversal  does  occur.  The 
following  example  illustrates  that  our  technique  can  overcome  the 
problem  of  order  reversal. 


13  2  1 

*-•  v-s 

/  \  /  \ 

3*  >4  4*  >3 

\  I  I  / 

\  *6  6*  / 

\  /  \  / 

•  * 

8  6 

Iaigs  A  Inags  B 

Flgurs  4 


Ai  wiU  match  all  Bi  (i  «•  1 . 6)  using  this  matching  technique. 


NoU  that  junction  A3  is  to  the  left  of  junction  A4  in  image  A  but  B3 
is  to  the  right  of  junction  B4.  Tbe  reason  that  A4  is  matched  to  B4 
is  that  A2  and  A5  connecting  to  A4  match  with  B2  and  B3  of  B4. 
And  we  reject  the  match  of  A3  and  B4  by  the  fact  that  A6 
connecting  to  A3  does  not  match  with  B6  connecting  to  *34. 

ON  REAL  IMAGES 

The  curve  segments  used  are  obtained  by  npplying  the  Nalwa 
Operator  (Nalwa  1984]  to  a  pair  of  stereo  images.  The  operator 
produces  a  list  of  edgels  by  Fitting  a  directional  tanh-surface  to 
windows  in  the  images.  These  edgels  are  then  Linked  and  fitted  with 
conk  sections  and  straight  lines  |Nalwa  1985|.  The  linking  and  curve 
fitting  algorithms  also  locate  L  junctions  which  we  hnve  utilised. 

We  have  modified  our  algorithm  to  deal  with  data  from  real 
images.  In  real  images  the  features  detected  include  additional 
spurious  edges  and  missing  edges.  Some  curves  appear  broken.  Also 
tbe  type  and  degree  of  a  junction  may  differ  from  a  noise  free  tine 
drawing. 

Therefore  we  want  to  bridge  broken  curves  and  reconstruct 
junctions  to  the  greatest  extest  before  we  do  the  matching.  We 
extend  curves  along  their  tangent  directions.  The  intensity  on  both 
sides  of  the  curve,  obtained  from  image,  is  U-ed  to  guide  extending 
tbe  edge.  Since  we  know  the  approximate  petition  and  direction  of 
the  extended  edge,  we  plan  to  ose  n  small  operator  to  pinpoint  the 
location  of  the  extended  edge  after  a  Inrger  operator  to  locale 
us  known  edges.  This  will  give  n  more  accurate  localization  of  edge* 
that  ccuid  not  -e  located  before.  We  rind  some  new  junctions  when 
tbe  extended  edge  intersects  another  curve  Tbe  type  and  order  of  a 
junction  will  change  when  the  extended  edge  runs  into  chat  junction. 

All  junctions  are  d.vsified  by  their  type  and  order  before  the 
matching  phase.  However,  we  feel  that  a  good  corner  Finder  is 
essential  to  Find  all  the  junctions  and  locate  tnem  more  accurate!?. 
Existing  corner  finders  rsseutially  search  for  places  that  have  a  high 
curvature  in  the  intensity  surface.  Many  of  whkh  are  really  not 
junctions  in  the  image  [Dreschler  1982,  Kitchen  1982|. 

We  then  use  the  above  algorithm  *o  match  all  junctions  that 
ran  be  maimed.  We  relax  tbe  constraint  that  all  conne  ted  junctions 
must  be  on  the  same  epipolar  lines.  They  may  have  different  degree 
or  type  In  noisy  images  some  junctions  may  be  distorted  because  of 
missing  curves.  Also,  we  require  only  that  majority  or  the  connected 
junctions  be  matched  A  broken  curve  hat  different  end  points 
compared  to  its  corresponding  curve.  Broken  curves  are  matched 
point  by  point  by  tracing  along  tbe  curves.  Relaxing  the  requirement 


376 


eft  Icage 

Figure  b 

Right,  Icage 

~T 

• 

k  ;  , 

Figure  6 

k  ^  ,  • 

.... 

Figure  7 

•  7 

i  •  . 

o 

-4-  :>  * 

Pist-pri'cssei 

g'-e  8 

r~ 

-4  *  . 

Fre-proceoroJ  Pcsi-proce 

F-.gure  9 

also  HMU1  that  the  global  constraint  ia  lot  u  itroag  u  before 
However,  oar  malts  ibowt  tkat  relaxing  tke  coaatraiau  doer  not 
raoae  stgair»raat  deterioration 

la  figure  o  we  ibow  a  pair  of  grey  acale  image*  of  tome  block*, 
cylinder  aad  iph're*  Figare  C  b  tke  resalt  from  edge  detectioa  ud 
edge  baking.  Note  tkat  most  of  tke  junctions  art  aot  formed 
poperly.  The  maia  reasoa  t»  that  tke  edge  detectioa  algorithm  a 
designed  to  look  for  a  siagle  step  edge  ia  a  window.  It  breaks  down 
whea  there  are  more  tkaa  one  edge  >n  the  window. 

Figare  7  show*  the  malts  obtataed  by  extending  Ike  carve*. 
Note  tkat  tke  missing  gap  oa  the  cylinder  i*  bridged.  Some  new 
jaactioas  are  foaad  and  some  old  oaes  arc  modified  a*  flair  degree 
and  type  change.  An  enlargement  of  one  of  tke  blocks  is  shows  ia 
figare  8  and  V  Oa  the  left  are  tke  pre-processed  drawings  and  oa  the 
right  are  the  post  processed  images 

Osr  algorithm  matches  all  jaactioas  except  jaactioa  S  ia  Ike 
left  image  aad  jaactioa  7  ia  the  rigbl  | post-processed  image*  of  figare 
8  and  9).  Note  that  n  matrkiog  carves  hetv.eea  jaactioas  2  and  4, 
we  bypass  the  T  jsactioa  betweea  sisce  it  eorresposds  to  in 
occlntioa  jaactioa.  We  are  stable  to  match  jc-rtioa  3  bee  sue  a 
eorrespoodiag  L  jaact.ua  is  so*  fouad  ia  the  right  image,  laatead,  a 
high  carvatare  conic  carve  it  fitted  to  that  jaactioa.  Fads  re  to 
match  jaactioa  7  is  dse  partly  to  tke  above  reasoa  aad  partly  to  the 
fact  that  the  extended  carves  did  aot  form  a  new  jaactioa.  Matching 
of  tke  carve*  with  both  ead  points  matched  are  simple.  We  match 
‘.be  other  aamatched  carves  start  tag  from  tke  matched  jaactioa*  aad 
trace  along  both  carves  as  described  above. 

COMPUTATION  TIM* 

We  believe  that  by  matekiag  the  jaactioas  fust  sad  Ibea  edgeis 
along  carve*  betweea  JV  sc  tiows  leads  to  efficient  matekiag  Tke 
Dumber  of  jaactioas  ia  as  image  is  mack  less  tbsa  tke  timber  of 
edgeb  is  the  same  image  aad  the  is  formation  at  s  jaactioa  ■  more 
than  that  ia  aa  edgel 

We  aaalyre  tke  complesity  of  matekiag  by  edgeb  first  sad  then 
compare  it  with  the  methods  of  matching  jaactioas 

kssuas  there  ar* 

E  -  nuabsr  of  idgsls  in  an  laage 

*  •  »  -  nuabsr  of  pixtla  in  an  lacg a 

L  -  average  nuabsr  of  sdgsls  la  a  curva 

Thsrvfrr*.  on  rh*  avsrags. 


E/I  -  nuabsr  of  sdgsls  psr  scan  Una 

(E/I)1  -  pairs  of  sdgsls  par  scan  Unas  for  Batching 

EJ/B  -  palrtis*  coabination*  of  sdgsls  which  satisfy 

spipolar  constraint  for  whola  iaag* 

E/L  -  nuabsr  of  currss  ia  ar  iaag* 

V  -  2E/L  -  nuabsr  of  junctions  (worst  cast  whars  no  two 

lints  hsrs  ths  a sat  snd  points) 

V/l  -  nuabsr  of  Junctions  psr  sesa  11ns 

(i/I)1  -  pairs  of  junctions  psr  sesn  lints  for  astching 

V1/!  -  pairtlst  coabinstiona  of  junctions  shicb  sstisfy 

spipolar  constrsint  for  shols  iasgs 

Tbs  tots!  cowbinstion  for  astching  junctions  is 

V1/!  *  (4/l,)a(E1/l). 

which  la  s  factor  of  4/L1  lass  than  that  of  astching 
sdgsln.  0 

When  the  length  of  the  cwrre  segment  ■  oa  average  30  piseb 
long  then  there  is  a  225  timer  redaction  ia  compwtstioa  time. 
However,  we  mast  bain  we  tke  coat  of  the  stereo  correspondence  with 
tke  cost  of  segmeatslina  aad  aggregation  repaired  to  get  tke  required 
addition  si  stractarew  before  matching. 

CONCLUSION 

We  believe  oar  technique  offers  -a  efficient  sod  reliable  way  for 
stereo  marching.  It  mateben  aot  oaly  tke  chosen  feature*  tkemaelves 
ssiag  local  coaslrsiL-s  bat  also  tke  relationships  between  these 
restates  by  global  coostnia  Is.  However,  we  do  have  to  modify  tke 
tecksiqse  is  the  presence  of  aosse.  This  a  a  preliminary  report.  We 
are  incorporating  many  mere  constraints  at  all  structural  leveb.  Tke 
•y item  will  be  applied  to  noe  image*  derived  from  the  real  world 
mad  more  malts  will  follow 

ACKNOWLEDGEMENT 

This  will  was  supported  by  the  Defense  Advanced  Research 
Projects  Agency  voder  contract  Number  N  4M3tM4-C-021 1  The 
authors  would  like  to  extend  their  gratitude  to  Ybhvjit  S.  Nslws  sad 
Frir  Paorkoa  for  prvxidiag  the  secexiary  data  for  this  work 

REFERENCE 


370 


l 


[Arnold  1978]  D.Arnola,  'Local  context  in  matching  edges  for 
stereo  vision*,  Proceeding:  Image  understanding  workshop,  May, 
1978. 

[Baker  198l|  H.H.  Baker.  'Depth  from  edge  and  intensity  baaed 
stereo',  Ph.D.  thesis,  University  of  Illinois,  September,  1981. 

[Binford  198i|  T  OBinford.  'Inferring  sarfaees  from  images*. 
Artificial  Intelligence,  Volume  17,  1981. 

[Dreschler  1982|  L.  Df-srhler,  H.H.Nage*.  ‘■Volumetric  model 
and  3D  trajectory  of  a  moving  car  derived  from  monocular  TV  frame 
sequences  of  a  street  scene*,  Computet  Graphics  and  image 
processing,  Volume  20,  1982. 

[Hannah  197t|  M  l  Hannah,  'Computet  matching  cf  areas  in 
stereo  imagery*,  Ph  D  thesis,  Stanford  University,  1974. 

|Ganapathy  1978)  S.  Gan  apathy.  'Reconstruction  of  scenes 
containing  poly  bed ra  from  stereo  pair  of  views',  PhD.  thetas, 
Stanford  University,  1976. 

[Gessery  1979)  D.  Cennety,  'Stereo  earners  calibration*, 
Proceeding'  Image  understanding  wotkshop.  November.  1979. 

[Cnmsoo  1979J  WELGnmsou.  D  Man.  *A  computer 
implementation  of  u  theory  of  human  stereo  vision*,  Proceeding: 
Image  understanding  workshop,  April.  1979. 

[Henderson  1979|  R.L.  Henderson.  W  J  Milter.  CB  Crunch. 
'Automatic  stereo  reconstruction  of  man-made  targets'.  Society  of 
phoio-optral  instrumentation  engineer.  Volume  !M,  Number  S. 
1979 

[Kitchen  1982]  L.Kitcheu.  A  RoeruMd.  'Gray-level  corner 
deice  inn'  Pattern  reeoguitson  ini' m,  Votum.  I.  I9€2. 

[Mac V  rae- Whelan  IMI|  P  MacVicar-Whelan.  T  Btaford, 
Terre  finding  with  subptr-1  peecnaon*.  Pioreedieg  Image 
aaderslaadiag  workshop.  April.  |S<I 

[Malik  1934!  J  Malik.  T  o  Buford,  'a  theory  of  hoe  drawing 
inierp  fetal  ton'.  Proceed  lags  Image  aaderslaading  workshop. 
Orlober.  198r 

|Varr  1978]  D  .Vlarr.  T  Poggm.  ■Cooperative  computation  of 
stereo  disparity*.  Science.  Volume  194,  1978. 

[Mate  1977|  D  Mart.  T  Poggw.  *A  theory  of  human  stem, 
inns'.  MIT  Al  Memo.  Dumber  451.  November,  1977 


[Medioni  1985|  G.M.  Medioai,  R.  Me v alts  "Segment-based 
stereo  matching*,  Computer  vision,  graphics,  sad  image  processing, 
Volume?’,  1985. 

[Morarec  1981)1  HP  Morrvec,  'Obstacle  aroidsnee  and 
navigation  in  the  real  world  by  a  seeing  robot  rover*.  Ph.D.  thesis, 
Stanford  University,  May,  1980. 

[Nalwa  1984|  V.  S.  Nalwa,  'On  detecting  edges'.  Proceedings: 
Image  understanding  workshop,  October,  1984. 

[Nalwa  1985]  V.  S.  Nalwa.  EL  P  sue  boa.  •  Algorithms  for  edgei- 
aggrrgation  and  edgr-deaenption',  Proceeding:  Image  uaderstanumg 
workshop,  December,  1985. 

]Obu  1983)  Y.Ontn.  T  Kaaade.  'Stereo  by  intra  mad  ia ter¬ 
se  as  line  search  using  dynamic  programming*  Technical  report  CMU- 
CS-M-182.  Octoker.  1983. 

[Pan ton  1978)  D.J.  Pinion,  *A  Flexible  approach  to  digital 
svrreo  mapptng*.  Photo*  ram  met  nr  Eagiueerug  and  remote  sensing. 
’V  me  44.  Nam  Ur  12.  December  1978. 

[Perk as  1970)  D.N  Perkins.  'Compnter  stereo  vision:  A 
com  bw  atonal  theory  with  naplemenutioa*.  PhD  thews. 
Department  of  Mathematics.  M.I.T..  funs  1970. 

[Tab  am  on  1984|  J.Takumnrcy  TO.  Binford.  'Stereo  modeling 
system:  A  geometric  mow  lug  system  foe  mode  lug  objert  instance 
and  class*.  Proceedings:  Image  anderstandug  workshop,  October. 
1984 

APPENDIX 

The  objective  reaction  says  that  we  wait  to  maximise  the 
number  of  matched  janetsoau. 

nun  abll  *  abll  •  th21  ♦  ab22  ♦  ub33  •  tU4  ♦  sUi  •  aM3 

«  ab44  *  nhth  V  abb#  *  ubK  •  abbi  •  ab««  *  ab8T  *  abU 

•  ab7S  •  ah 77  *  ab71  <  tMt  •  ab<7  •  ab38 

at 

(Tbs  fol lowing  inequalities  require  that  the  Junctioua  can 
only  bo  satebsg  if  all  the  Juactiesa  coanecMng  *-o  it  are 

also  satebod) 

3abl  1  -  ab22  -  ab33  -  eb«  <*  3 
1ab22  -  abll  -  ab66  <«  0 
2abU  -  abll  -  sbSS  <-  0 
lit  44  -  abll  -  ebSi  -  at 77  <>  0 

la',46  -  «bl2  -  sb(4  -  ab78  <»  0 

Sab 64  -  a621  -  \b46  -  a687  <*  0 

3tbS6  -  <b72  -  ab44  -  aS88  <-  0 

2ab«8  -  ab33  -  ab77  <*  0 


2ab68  -  itn  -  ab77  <•  0 
W77  •  ibM  -  abM  -  UM 
2abM  -  abU  -  a* 77  <«  0 
2abM  *  abM  -  ab77  <*  0 


0 


(TOa  folloslag  aqaatloao  ailitllrt  tbo  aalqaoaoas 

constraint*) 


afall 

ahl3 

i 

ab21 

ab23 

i 

aU] 

abM 

ahSb 

1 

ah  43 

abM 

abU 

1 

abU 

abM 

ahM 

1 

ahM 

aM7 

abM 

1 

ah7« 

ab77 

ahTO 

1 

ahH 

ab47 

ahM 

1 

ahl  1 

*31 

1 

ah!3 

*23 

1 

ab33 

ab«3 

abM 

1 

ahM 

aM4 

abM 

1 

ab3i 

abU 

abM 

1 

abM 

ab7« 

abM 

1 

at  97 

*rr 

abt7 

1 

abM 

*n 

abM 

1 

(TO;  folios inf  sqaatlco»  aaka  Sara  Uit  than  la  sltbor  a 
catch  or  ao  Batch  bat  aothlaf  hataaaal 

s'lj  «  1  or  0  (1  -  l,  1  »  1 . •) 

(ft. salts  froa  LIBO.  a  llaaar  profraaalaf  fochaf  shlch 
baa  in  of tloa  of  solslhf  latofar  profraaalaf  problao) 

ahlj  S  I  far  1  >  J 
ahl)  »  0  for  1  «»  ] 


380 


GOQION 5  -4/ 


Direct  Passive  Navigation: 
Analytics'  Solution  for  Pianes 


S.  Nejjahdaripour  and  B.K.P.  Horn 

The  Artificial  Intelligence  Laboratory,  Massachusetts  Institute  of  Technology 


In  this  paper,  l*  deriv-  a  closed  form  solution  for  re¬ 
covering  the  motion  of  an  observer  relative  to  9  plana- 
turf  act  directly  from  image  brightness  derivatives.  We 
do  not  eomptto  the  optical  flow  as  an  intermediate  step, 
only  the  spatial  and  temporal  intensity  gradients  at  a 
minimum  of  8  points.  Wc  solve  a  linear  matrix  ^na¬ 
tion  for  the  elements  of  a  8x8  matrix.  The  eigenvalue 
decomposition  of  its  symmetric  part  is  then  used  to  com¬ 
pute  the  .notion  parameters  and  the  plane  orientation. 


>  Introduction 

The  problem  of  determining  rigid  body  motion  and  sur¬ 
face  structure  from  image  data  has  been  the  topic  of 
many  research  papers  in  the  area  of  machine  vision 
[1-22].  Many  approaches  bated  on,  tracking  feature 
points  [5,11,19,20]  or  contours  [9],  using  motion  floss 
field  [1,3,4,10,12,16,17,21,22]  texture  [2],  or  image  in¬ 
tensity  gradients  [14,15]  have  been  proposed  in  the  lit¬ 
er  ature. 

In  the  feature  point  matching  schemes,  information 
about  a  finite  number  of  well-seperated  points  is  used  to 
recover  the  motion  [general  8-point  2-frame  algorithms 
of  Longuet-Higgins  [11],  Tsai  and  Huang  [20],  Buxton  et 
al.  [5],  and  the  algorithm  of  Tsai,  Huang  and  Zhu  |19| 
for  planar  surfaces).  These  methods  require  identifying 
and  matching  feature  po:nts  in  a  sequence  of  images. 
The  minimum  number  of  points  required  depends  on 
the  number  of  image  frames.  With  2  frames,  in  most 
cases,  a  minimum  of  5  points  results  in  a  unique  solu¬ 
tion  from  a  set  of  nonlinear  equations.  However,  using 
8  points,  as  l;i  algorithms  cited  above,  one  only  solves 
linear  equations.  Here,  it  is  assumed  that  the  more  dif¬ 
ficult  ,  oblem  of  establishing  point  correspondence  has 
already  '-ter  so!  ed.  In  general,  this  involves  determin- 
,ng  corners  along  contours  using  iterative  searches.  For 
images  of  smooth  objects,  it  is  difficult  to  find  good  fea¬ 
ture  or  corners. 

For  the  general  case  of  smooth  surfaces,  Longuet- 
Higgins  and  Prazdny  [llj  suggested  a  method  that  uses 
the  optical  flow  and  its  first  and  second  derivatives  at  a 
single  point.  Later,  Waxman  and  Ullman  |21]  developed 
this  into  an  algorithm  for  recovering  the  structure  and 
motion  parameters  frem  a  set  of  nonlinear  equations. 


Subbarao  and  Waxman  [17]  recently  found  a  closed  form 
solution  to  the  original  formulation  in  [21]  for  planar 
surfaces.  These  methods,  while  mathematically  elegant, 
are  very  sensitive  to  errors  in  t  he  optical  flow  data  since 
second  order  derivatives  of  noisy  data  are  used. 

At  the  expense  ot  more  computation,  more  robust  al¬ 
gorithms  have  been  suggested  using  the  optical  flow  at 
every  image  point  [1,3,4],  Longuet-Higgins  |12|  has  pre- 
tmted  a  closed  form  solution  for  planar  surfaces,  very 
similar  to  ours,  using  the  coefficients  of  the  second  or¬ 
der  optical  flow  equations.  However,  it  is  assumed  that 
both  components  of  the  flow  field  have  already  keen 
computed  for  a  minimum  of  5  image  points. 

By  representing  s  planar  surface  in  the  .‘orm  of  a 
closed  contour,  I'anataui  [9|  has  shosvn  that  the  surface 
and  motion  parameters  can  be  computed  by  measur¬ 
ing  'diameters*  of  the  contour  using  line  and  surface 
integrals.  Here,  no  point  correspondence  is  required. 
Assuming  that  the  planar  surface  has  a  uniform  texture 
density,  Aloimocot  and  Chou  ]2|  have  presented  a  proce¬ 
dure  for  computing  the  motion  and  surface  orientation 
from  texture. 

In  much  of  the  research  work  in  recovering  surface 
structure  and  motion  from  the  optical  flew  field,  it  is  as¬ 
sumed  that  a  reasonable  estimate  of  the  full  optical  flow 
field  is  available.  In  general,  the  computation  of  the  lo¬ 
cal  flow  field  exploits  a  constraint  equation  between  the 
local  intensity  changes  and  the  two  components  of  the 
optica]  flow.  However,  this  only  gives  the  component  of 
the  flow  in  the  direction  of  the  intensity  gradient.  To 
compute  the  bill  flow  field,  one  needs  additional  con¬ 
straint#  such  os  the  heurist:c  assumption  that  the  flow 
Geld  is  locally  smooth  [7,8|.  This,  in  many  cases,  leads 
to  optical  flow  fields  that  are  not  consistent  with  the 
true  motion  field. 

In  an  earlier  paper,  we  presented  an  iterative  scheme 
for  recovering  the  motion  of  an  observer  relative  to  a 
planar  surface  directly  from  the  image  brightness  deriva¬ 
tives,  and  the  need  to  compute  the  local  flow  field  [14,15], 
Further,  using  a  compact  vector  notation,  we  showed 
that,  at  most,  two  interpretations  are  possible  for  pla- 
nar  surfaces  and  derived  the  relationship  between  them, 
here,  we  present  a  closed  forrr  solution  to  the  same 
problem.  We  first  solve  a  linear  matrix  equation  for  the 
elements  of  a  3x3  matrix  using  intensity  derivatives  st 
a  minimum  of  8  non-coliriear  ooir.ts.  The  special  struc- 


331 


I 


ture  of  this  matrix  allow*  u*  to  compute  the  motion  *.nd 
structure  parameter  very  easily. 

2.  Preliminaries 

We  first  recall  some  details  about  perspective  projec¬ 
tion,  the  motion  field,  the  brightness  change  constraint 
equation,  rigid  body  motion  and  planar  surfaces.  This 
we  do  using  vector  notation  in  order  to  keep  the  result¬ 
ing  equations  as  compact  as  porsible. 

2.1.  Perspective  Projection 

Let  the  center  of  projection  be  at  the  origin  of  a  Carte¬ 
sian  coordinate  system.  Without  lost  of  generality  we 
assume  that  the  effective  focal  L  h  is  unity.  The  im¬ 
age  is  formed  on  the  plane  s  =  i,  parallel  to  the  ry- 
plane,  that  is,  the  optical  axis  'ice  along  the  s-axis.  Let 
R  be  a  point  in  the  scene.  Its  projection  in  the  imag* 
is  r,  where 


The  s-component  of  r  is  clearly  equal  to  one,  that  is 
r  •  1  =  1. 

2.2.  Motion  Field  and  Optical  Flow 

The  motion  fitld  is  the  vector  field  induced  in  the  i  sn¬ 
ags  plane  by  the  relative  motion  of  the  obeerver  with 
respect  to  the  environment.  The  optical  flew  is  the  ap¬ 
parent  motion  of  brightness  patterns.  Under  favourable 
circumstances  the  optical  Bow  <  identical  to  the  mo- 
ion  field  (moving  shadow*  or  uniform  objects  in  motion 
could  create  discrepancies  between  the  motion  field  and 
the  optical  Sow.  Here,  we  assume  that  the  motion  flow 
field  and  the  optical  flow  are  the  same).  Ihe  velocity  of 
the  image  r  of  a  point  R  ia  given  by 

—  -  d  1  n 

< it  dt  R  » 

For  convenience,  we  introduce  '.he  notation  r«  and  R< 
for  the  time  derivatives  of  r  and  respectively.  We 
then  have 

ft  =  R71Rf  *  '  ^R' 

which  can  also  be  written  in  the  compact  form 

r‘ =  (iriFu  *  (Ri  *  R))- 

since  a  x  (b  x  c)  =  (c  •  a)h  -  (a  b)c.  The  vector  r»  lies 
in  the  image  plane,  and  so  (r.-  •  i)  -  0.  Further,  r,-  =  0. 
if  Rj  ||  R,  as  expected. 

Finally,  noting  that  R  =  (R  ■  i)r,  we  get 
rt  =  =r~. At  x  (R,  x  r)). 

it  '  1 


2  3.  Rigid  Body  Motion 

In  the  case  of  the  observer  moving  relative  tc  a  rigid 
environment  with  translational  velocity  t  and  rotational 
velocity  ui,  we  find  that  the  motion  of  a  point  is  the 
environment  relative  to  the  observer  is  given  by 

Ri  —  —ci  x  R  —  t. 

Since  R  =  (R  •  i)r,  we  can  write  this  cs 
B«  =  -(R-  t)u  x  ?-  t. 

Substituting  for  R<  in  the  formula  derived  above  for  »«, 
w*  obtain 

r,=  -(»x(rx(rxw-~t))). 

It  is  important  to  ;  e member  that  there  ia  an  inherent 
ambiguity  here,  since  the  tame  mol'on  field  results  when 
distance  and  the  translational  velocity  are  multiplied  by 
an  arbitrary  constant.  This  can  be  seen  easily  from  the 
above  equation  since  the  came  image  plane  velocity  is 
obtained  if  one  multiplies  both  R  and  t  by  aome  con¬ 
stant. 

2.4.  Brightness  Change  Equation 

The  brightness  of  the  image  of  a  particular  patch  of  a 
surface  depends  ou  me  jy  factors.  It  may  for  '•sample 
vary  with  the  orientation  of  the  patch.  In  many  cases, 
however,  it  remains  at  least  approximately  constant  as 
the  suris'c  moves  in  the  environment.  If  we  assume 
that  the  usage  brightness  of  a  patch  remains  constant, 
we  haw 


dE  dr  dE 
dr  dt  dt 

where  dE/dr  =  (dE /3z,dE /dy,0)T  is  the  izzkige  bright¬ 
ness  gradient.  It  is  convenient  to  use  the  notation  E, 
for  this  quani  ity  and  Et  for  the  time  derivative  of  tre 
brightness.  7  hen  we  can  write  the  bnghtneM  change 
equation  in  tie  simple  form 

E,  ■  T(  +  Et  =  0. 

Substituting  for  r,  we  get 

E,-Er-(tx(rx(rxu,-sLit)))  =0. 

Now 

E,-(tx  (r  x  *■)  =  (E,  x  £) •  (r  x  t)  =  ((£,  x  i)  x  r)  t, 
and  by  similar  reasoning 

E,  •  (i  x  (r  x  (r  x  w)) )  --  (((Erxtjxr)  x  r)  u-, 
so  we  have 

Et-  (((E,  x  t)  x  r)  x  r)  -u  +  -^-((Er  x  t)  :<  r)  t  =  0. 

K  *  z 


t 

i 

i 


k 

n 


s 
* 
r, 
r, 
• 
i . 

f 


£ 


382 


To  make  this  constraint  equation  more  compact,  let  us 
define  e  =  Et,  t  —  (Et  x  i)  x  r.  and  v  =  -s  x  r,  then, 
finally, 

c  +  V'ui  +  — - — s  ■  t  =  0. 

Ri 

This  is  the  brightness  change  equation  in  the  case  of 
rigid  body  motion. 

2»S«  Planar  Surface 

A  particularly  impoverished  scene  is  one  consisting  of  a 
single  planar  surface.  The  equation  for  such  a  surface  is 

R  •  n  —  1, 

where  n/  jnj  is  a  unit  normal  to  the  plane,  and  1/  |n|  is 
the  perpendicular  distance  of  the  plane  from  the  origin. 
Since  R  =  (B  •  i)r,  we  can  write  this  as 

1 

r  n=R"I’ 

so  the  constraint  equation  becomes 

e  +  v  •  u;  +  (r  •  n)(s  •  t)  =0. 

This  is  the  brightness  change  equation  for  a  planar  sur¬ 
face  Note  again  the  inherent  ambiguity  in  the  con¬ 
straint  equation.  It  is  satisfied  equally  well  by  two 
planes  with  the  same  orientation  but  at  different  dis¬ 
tances  provided  that  the  translational  velocities  are  in 
the  same  proportions. 

3.0.  Essential  Parameters  for  risnsr  Surfaces 
The  brightness  change  equation  can  be  written  as: 

e  +  (r  x  s)  •  u  +  (r  •  n)(s  •  tj  -  0. 

Now,  using  th-  identity  (r  x  s}  •  w  =  (s  x  w)  ■  r,  we  get: 


e  +  fs  x  w)  •  r  +  (r  •  n)(s  •  t)  =  0. 


Let  us  define. 

I  0  -wj 

W  =  -  +<jj  0  — W|  j  . 

V  —  0  J 

then,  (s  x  w)  =»  *V»,  in  which  cue,  w«  arrive  at: 
c  +  rWi  +  rrntrs  =  0. 


We  will  refer  to  p,  ,  t  =  1,2,...,  9  as  the  essential  pa¬ 
rameters  (in  agreement  with  Tsai  and  Huang  [20])  since 
these  p  vrametexs  contain  all  the  information  about  the 
planar  surface  orientation  and  motion  parameters. 

The  above  constraint  equation  ic  Lnear  in  th«.  ele¬ 
ments  of  P.  Several  such  equations,  for  different  image 
points,  can  be  used  to  solve  for  these  parameters.  We 
will  show  how  the  special  structure  of  P  can  be  ex¬ 
ploited  to  recover  the  motion  and  structure  parameters 
very  easily. 

S.  Recovering  Essential  Parameters 

The  nine  essential  parameters  satisfy  the  following  con¬ 
straint  equation: 

e  +  i^Ps  =  0, 

or  in  terms  of  p,  a  vector  of  length  9  whose  i-th  element 
isp,: 

e  +  arp  =  0, 


•  =*  (rl*l i rl*J. rl*Ji r**l i rjsj, rjsj, rjSj, rs*j, rjs*)1 . 

We  can  compute  them  using  image  brightness  E(s,  y,  t), 
and  its  spatial  and  time  derivatives,  Et  end  Et,  over 
soma  region  /  in  the  image  plane.  Since  there  are  only 
eight  motion  ant  surface  parameters  to  recover  (There 
are  three  components  of  each  of  w,  t,  and  n.  How see*, 
the  translational  velocity  and  the  surface  normal  can 
be  recovered  up  to  a  scale  factor.),  only  eight  of  the 
r’s  are  independent.  This  implies  that  we  can  arbi¬ 
trarily  fix  on*  of  the  sesentiai  parameters,  and  compute 
the  remaining  eight  using  eight  independent  points  (At 
each  point,  w«  get  one  constraint  and  we  have  eight 
unknowns  to  recover). 

Let  p7  =  (j4,p4,—  i^.O)  denote  the  solution  ob¬ 
tained  by  setting  the  last  element  equal  to  sen.  If  *~t 
define: 

p'  =  fai.Pj . Ps). 

*  =  (r:«i,r,jj,risj,ri*i,ri#j,rjej,r*.ii,r*«j)r, 

then  the  above  constraint  equation  reduces  to: 
iTp' 1 1  =  0. 

Using  eight  independent  pc.nta,  we  out  solvo  the  follow¬ 
ing  linear  matrix  equation: 

Ap'  +  c  =  0, 


Finally,  after  collecting  terms: 


c  +  rrPs  =  0, 


(  Pi  Pi  Pi  \ 

P  --  i  Pi  "j  ps  i  =  W  +  nt  . 
v  PT  P*  P9  ' 


A. —  (si,i),...,i|j  ,  c  —  \C: ,£},...,  c§). 

The  solution  of  the  above  equation  is: 

p'  =  -A_lc. 

Image  brightness  values  are  distorted  with  .sensor 
noise  and  quantization  error.  These  inaccuracies  are 


383 


further  accentuated  by  methods  used  for  estimating  the 
brightness  gradient.  Thus  it  is  not  advisable  to  base  a 
method  on  measurements  at  just  a  few  points.  Instead 
we  propose  to  minimize  the  error  in  the  brightness  con¬ 
straint  equation  over  the  whole  region  /  in  the  image 
plane.  So  we  choose  the  essential  parameters  that  min- 


/yVp'  +  c)’< 


The  solution;  in  this  case,  is  given  by: 

p'  =  —  (/ J  aa7  dxdyj  [f f  cidxdy}  . 

Note  that,  in  general,  the  true  pg  is  We  can 

show  that  the  solution  obtained  through  the  assumption 
that  pg  =  0,  p',  and  the  true  solution  (denoted  by  p) 
are  related  by  the  equation: 

P  =  P'  +  P#«,  u  =  (l,0,0,0,l,P,0,0,l)r. 

The  proof  roea  as  follows.  Since  s  =  (ET  x  1)  x  r,  then 
r7*  =  r  •  ( (rT  x  l)  x  r)  =0.  For  any  arbitrary  constant 
f,  inch  that  L  =  ll  'i  >s  the  identity  matrix),  we  have: 

rrLs  =  0. 

If  we  add  this  to  our  costraint  equation,  we  get: 

«  +  r^W  |  ntr  +  L)i  a  0. 

It  is  immediately  apparent  that  any  P*  of  the  form:  C~ 

( P'l  Pi  Pz\  _ 

<**«  Pi  Pi  Pi  =  W  +  ntr  +  L 

v  f4  pi  pi  J 

will  also  satisfy  our  constraint  equation.  Therefore,  the 
two  solutions  for  the  essential  parameters  are  related 
by: 

P  =  P'  -  L, 

or  in  terms  of  p  and  p': 

p=p'-iu,  u  =  (1,0, 0,0, 1,0, 0,0,1), 

for  some  constant  /.It  follows  that  p»  =  pi  -  I.  Now  if 
Pg  =  0,  then  l  =  — pg,  and  so: 

P  =  p'  +  pg«- 

In  term  of  matrices  P,  and  P': 

P  =  P'  +  pgl. 

The  procedure  for  determining  pg,  and  subsequently  P 
is  presented  in  the  appendix.  For  now,  we  assume  that 
P  is  known  (note  that  we  have,  so  far,  shown  how  to 
compute  P'  and  not  P). 


4.  Recovering  Motion  and  Structure 

We  now  show  how  to  compute  the  parameters  of  the 
translational  motion  and  the  plane  orientation  from  the 
essential  parameters.  Once  these  are  known,  the  rota¬ 
tional  parameter-,  are  determined  from: 

W  =  P  -  ntT 

Since  W  is  skew-symmetric: 

P*  =  P+P7  =  (W+WT)  +  (tn7+nt7)  =  (tnr+ntr). 

We  will  derive  our  results  in  terms  of  the  normalised 
n,t,  and  P*.  that  is: 

n=r h’  t  =  Iti  =  tf  p,=aw]tlp‘- 

Tn  order  to  present  the  solutions  for  n  and  t,  it  Js 
necessary  to  express  the  eigenvalue  decomposition  of  P* 
in  terms  of  these  vectors.  We  will  do  so  in  the  next 
lemma. 


Lemma  l:  Let  P*  =  UAUT  be  the  eigenvalue  decom¬ 
position  of  P*  =  (tnr  +  nt7),  and  let  r  =  j-TraceP*. 
Then: 

/r-1  0  0  \ 

A  =  (  0  0  0  , 

'00  r  +  1' 

r  t-n  txn  t  +  n 

l.\/2(l  -  r))  jt"x"£[  ^2(1  +  r))J  ’ 

Proof:  (t  x  n)  Is  the  eigenvector  corresponding  to  the 
zero  eigenvalue  of  P*  iince(tn7  +  ntr)txn  =  t(n-(t  x 

£})  +  n(t  •  (t  x  £))  =  0.  Since  it  is  symmetric,  P*  has 
3  orthogonal  eigenvectors.  The  other  two  eigenvectors 
are,  therefore,  in  the  plane  containing  t  and  n.  Let 
a  =  at  +  yjn  and  X  denote  an  eigenvector-eigenvalue 
pair  for  some  a  and  0  (to  be  determined).  Then 

(tn7  -v  fit7) (at  +  0n)  =  A(at  I-  f)h), 

which  limplifies  to 

|a(t-fi)  +  0(fi-fi;]t+  [a(t-t)  +0(t-fi)]fi  =  Aat  +  AiJfi. 
Since  (t  •  n)  =  r,  we  have: 

The  soli.tions  for  X  are  given  by: 

X  =  r±  1. 

Substituting  for  X  into  the  earlier  equations,  we  get: 

a  =  ±3. 

Using  these  into  the  equation  for  u,  and  normalizing  the 
results  yield: 


334 


t  -fc  » 

U-7W^ 

Note  that  dince  |-|  <  1,  we  have  Ai  <  Aj  =  0  <  Aj.  I 


We  can  now  det>_:ine  t  and  n.  Let  u,  denote  the 
i-th  column  of  U.  From  lemma  1: 


P*  ~  !n||tjP*  =  |n|P*, 

we  use  the  following  equation  to  normalise  F*  in  the 
firs,  place: 

P*  = - - - p* 

A3(P*)-A,(P*)  • 


s~v/2(rT7y 

From  the  expressions  for  the  eigenvectors,  it  follows 
that: 

t  +  u  =  v72(1  +  r)uj.  t  -  h  =  v/2(l  -  r)u|. 
Solving  these  for  t  and  ri,  we  get: 

ll  +  T  /  1  —  T  -  ll+T  ,/l  -  r 

n  =  V  — “*-V  —u*- 1  •-  V  — U»+V  “1- 

Since  the  choice  of  the  signs  of  the  eigenvectors  are  ar¬ 
bitrary,  we  should  repeat  the  above  procedure  with  the 
sign  of  either  Ui  or  Uj  reversed: 

t  --  n  _  t  +  n 

U|  */2(l  -  r)’  *  vy2(l  +  r) 

In  this  case,  the  second  soiuticn  is  obtained  from  the 
following  equations: 

1 1  +  r  IX-  r  -  ll  +  T  Jl~T 

n  =  V  WJ+V  ” 2~ u«.  *  =  V  j  u*— V 

This  is  the  dual  solution  for  the  planar  surfaces. 

The  special  case  of  t  j|  a  corresponds  to  when  the 
matrix  P*  has  multiple  eigenvalues.  Then,  either. 

X)  r  =  1,  for  which  Aj  =  A:  =  C.  Then  the  two  solutions 
merge  to  the  single  or.e: 

n  =  t  =  us, 


2)  r  =  -1,  so  that  Aj  --  Aj  =  0.  Here,  the  unique 
solution  is  given  by: 

n  =  -t  =  Uj. 


Since  the  translational  vector  and  the  surface  normal 
can  be  recovered  up  to  a  scale  factor,  we  can,  without 
loss  of  generality,  set  1 1 j  =  1.  It  :an  be  easily  shown 
that: 

A(P’)  =  |n|  |t|  A(P*)  —  ]n|  A(P*). 

Therefore: 

Aj(P*)  —  A,(P')  —  2  |n| , 

°r  1 

|n|  =  i(A3(P*)-A,(P*))- 


Once  we  compute  ri  and  t  from  the  equations  given  ear¬ 
lier,  we  van  determine  n  and  t  through  n,  oper  scaling: 

n  --  |aj  ri,  t  =  t. 

We  then  solve  for  the  rotation  parameters  by  substitut¬ 
ing  the  solutions  for  n  and  t  into  the  equation: 

W  =  P  -  ntr. 

Even  though  we  gave  a  complete  and  compact  proof 
of  the  .•> .  a!  solution  in  an  earlier  paperjlS],  it  is  intrigu¬ 
ing  to  confirm  those  results  with  our  dreed  form  so¬ 
lution.  u/e  showed  that  the  two  solutions  are  related 
by: 

n*  =  |n|t,  t*  =  w*=w  +  nxt, 

lnl 

where  we  have  arbitrarily  set  |t|  =  1.  The  two  solutiocs 
given  earlier  for  n  and  t  already  satisfy  the  duality  re¬ 
lationship  given  above.  It  remains  to  show  the  same  for 
the  rotation  parameters.  We  only  have  to  show  that: 

W*  +  nVT  =  P, 


/  0  »»!«,- n,«i 

W*  — _  W -i  ■  njtj  —  r»itj  0  njfj  —  nj<j  | 
V  n3<i  —  r.itj  njfj  —  r»jtj  0  J 


W*  =  W  +  ntr  -  tn  . 

Substituting  for  n*,  t",  and  W*  into  the  earlier  equa¬ 
tion,  and  simplifying  the  results,  we  get: 


W  +  ntr  =  P. 


5.  Summary 


In  tliis  paper,  we  presen* rd  a  closed  form  solution 
for  recovering  the  motion  of  an  observer  with  respect  to 
a  planar  surface  without  having  to  compute  the  optica] 
how  as  an  intermediate  step.  We  only  need  the  image 
intensity  gradients  at  a  rr;nir“’m  of  8  points.  However, 
in  general,  it  is  better  to  compute  gradients  in  a  larger 
portion  of  the  image  to  redi-*  <  the  noise  effects.  We  first 
employed  a  constraint  equation  we  developed  for  planar 
surfaces  to  compute  9  intermediate  parameters,  the  ele¬ 
ment'.  of  a  3x3  matrix.  We  referred  to  them  as  essential 
parameters.  The  sp  trial  structure  of  this  matrix  allows 
us  to  compute  ;ho  motion  and  plane  parameters  very 
easily. 


385 


>«.i(iuu'  .nil  ii 


.-. „■> 
rlV* 


Appendix 

In  the  previous  sections,  we  showed  how  the  mo¬ 
tion  paprameters  can  be  recovered  once  the  essential  pa¬ 
prameters  are  known.  However,  the  brightness  change 
constrain*  ation  allowed  us  to  deter nn^.e  the  matrix 
P'  (a  par*  -  ..ar  solution  of  P  with  the  last  element  set 
to  zero).  We  showed  that  the  two  solution  are  related 
b/: 

?  =  P'  +  P9l. 

Here,  wo  show  hew  to  determine  pg,  and  consequently 
P.  For  simplicity,  let  pa  =  /,  and  let  P  =  UAVr  de¬ 
note  the  eigenvalue  decomposition  of  P,  where  (A  = 
diagfAi,  Aj,  A3)),  then: 

P'  r  UAV7-  -11  =  UAVr  ~  IUVr. 

If  L  =  /I  then: 

P'  =  UAVr  -  ULVr  =  U(A  -  L)Vr. 

Similarly,  if  P*  =  P  +  Pr  =  UAUr  denotes  the  eigen¬ 
value  decomposition  of  P*,  then: 

P'“  =  P'+P'r  =  P+Pr-2L  =  P'-2L  =  U(A-2LlUr. 

In  lemma  1,  we  showed  that  Aj  <  A»  =  0  <  As  (and 
when  t  ||  n,  we  get  two  zero  eigenvalues).  Therefore, 
the  eigenvalues  of  P'a  can  be  arranged  in  the  form: 

A,  -  2/  <  -21  <  A3  -  21. 

It  follows  that  /  =  -  jAj(P'). 

So  in  summary,  we  assume  that  pg  =  0,  and  solve 
for  the  essential  parameters  (elements  of  P').  The  eigen¬ 
value  decomposition  of  P'*  allows  us  to  determine  the 
unknown  shift  (pg  =  -  jAj),  and  then,  P  from: 

P  =  P' -  i.\2(P')I 

Finally,  the  solution  of  the  motion  and  structure  param¬ 
eters  are  determined  from  the  equations  'given  earlier  in 
terras  of  the  trace  and  eigenvectors  of  P*  (note  that  the 
eigenvectors  of  P*  and  P'1  are  the  same). 

0.  References 

[1]  Adiv,  G.,  “Determining  3-D  Motion  snd  Structure 
from  Optical  Flow  Generated  by  Several  Moving  Ob¬ 
jects,’  COINS  TR  84-07,  Computer  and  Information 
Science,  University  of  Massachusetts,  Amherst,  MA, 
April  1984. 

[2j  Aioimonoa,  J.,  Chou,  P.  B.,  “Detection  of  Surface 
Orientation  and  Motion  from  Texture:  l.  The  Case 
of  Planes,"  TR  161,  Department  of  Computer  Sci¬ 
ence,  Univ.  of  Rochester,  Rochester,  NY,  January 
1985. 

[3]  Ballard,  D.H.,  Kimball,  O.A.,  “Rigid  Body  Motion 
from  Depth  and  Optical  Flow,*  TR  70,  Computer 
Science  Department,  Univ.  of  Rochester,  Rochester, 
NY,  1981. 


386 


[а]  Bruss,  A.R.,  Horn,  B.K.P.,  “Passive  Navigation," 
Computer  Vision,  Graphics,  and  Image  Processing, 
Vol.  21,  pp.  3-20,  1983. 

[5]  Buxton,  B.F.,  et  al.,  “3D  Solution  to  the  Aperaturt 
problem,"  Proceedings  of  the  Sixth  Europccn  Confer¬ 
ence  on  Artificial  Intelligence,  pp.  631-640,  Septem¬ 
ber  1984. 

(б)  Fennema,  C.L.,  Thompson  W.B.,  “Velocity  Deter¬ 
mination  in  Scenes  Containing  Several  Moving  Ob¬ 
jects,"  Computer  Graphics  and  Image  Processing,  9, 
pp.  3C1-315,  1979. 

[7]  Hildreth,  E.C.,  The  Measurement  of  Visual  Motion, 
MIT  Press,  1983. 

[8j  Horn,  B.K.P.,  Schunck,  B.G.,  “Determining  Optical 
Flow,"  Artificial  Intelligence,  Vol.  17,  pp.  185-203, 
1981. 

[9]  Kanatani,  K.,  “Detecting  the  Motion  of  a  Planar 
Surface  by  Line  and  Surface  Integrals,"  Computer 
Vision,  Graphics,  and  Image  Processing,  29,  pp.  13- 
22,  1985. 

[10]  Longuet-Higgins,  H.C.,  Prazdny,  K.,  “The  Interpre¬ 
tation  of  a  Moving  Retinal  Image,"  Proe.  of  Royal 
Society  of  London,  Series  B,  Vol.  208,  pp.  385-397, 
1980. 

[11]  Longuet-Higgins,  H.C.,  “A  Computer  Algorithm  for 
Reconstructing  a  Scene  from  Two  Projections,"  Ma¬ 
ture,  Vol.  293,  pp.  131-133,  1981. 

|r2j  Longuet-Higgins,  H.C.,  "The  Visual  Ambiguity  of  a 
Moving  Plane,"  Proe.  of  the  Royal  Society  of  Lon¬ 
don,  B  223,  pp.  165-175,  1984. 

[13]  Nagel,  H.,  “On  the  Derivation  of  3D  Rigid  Point 
Configurations  from  Image  Sequences,"  Proceedings 
of  Pattern  Recognition  and  Image  Processing,  Dal¬ 
las,  Texas,  1981. 

[14]  Negahdaripour,  S.,  Horn,  B.K.P.,  “Determining  3- 
D  Motion  of  Planar  Objects  from  Brightness  Pat¬ 
terns."  Proceedings  of  Ninth  Int.  Joint  Conf.  on  A. I., 
pp.  898-001,  1985. 

[15]  Negahdaripour,  S.,  Horn,  B.K.P.,  “Direct  Passive 
Navigation  ,*  to  appear  in  IEEE  Trans.  Pattern  Anal¬ 
ysis  and  Machine  Intelligence. 

[16]  Roach,  J.W.,  Aggarwal,  J.K.,  “Determinig  the  Move¬ 
ment  of  Objects  from  a  Sequence  of  Images,"  IEEE 
Trans.  Pattern  Analysis  and  Machine  Intelligence, 
Vol.  PAMI-2,  November  1980. 

[17]  Subbarao,  M.,  Waxman,  A.M.,  “On  the  Uniqueness 
of  Image  Flow  Solutions  for  Planar  Surfaces  in  Mo¬ 
tion,"  Proe.  of  the  Third  Workshop  on  Computer  Vi¬ 
sion:  Representation  and  Control,  pp.  129-140,  1985. 

[13]  Sugihara,  K.,  Sugie,  N.,  “Recovery  of  Rigid  Struc¬ 
ture  from  Orthographically  Projected  Optical  Flow," 
TR  8304,  Dep.  of  Inf.  Science,  Nagoya  University, 
Nagoya,  Japan,  October  1983. 


O 


[IS)  Tsai,  R.Y.,  Huang,  T.S.,  Zhu,  W.L.,  “Estimating 
3-D  Motion  Parameters  of  a  Rigid  Planar  Pitch, 
II:  Singular  Value  Decomposition,*  IEEE  Tran*,  on 
Acoustics ,  Speech,  an''  Signal  Processing,  VoJ.  ASSP- 
30,  No.  4,  August  1982. 

[20j  Tsai,  R.Y.,  Huang,  T.S.,  “Uniquen.>ss  and  Estima¬ 
tion  of  3-D  Motion  Parameters  of  Rigid  Objects  with 
Curved  Surfaces,"  IEEE  Tran*,  on  Pattern  Analyst t 
and  Machine  Intelligence,  Vol.  PAMI-6,  No.  1,  Jan¬ 
uary  1984. 

[21j  Waxman,  A.M.,  Ullman,  S.,  “Surface  Structure  and 
3-D  Motion  from  Image  Flow:  A  Kinematic  Analy¬ 
sis,*  C  VR-TR-24,  Comp.  Vision  Laboratory,  Center 
for  Automation  Research,  University  of  Maryland, 
College  Park,  MD,  October  1983. 

[22]  Waxman,  A.M.,  Wohn,  K.  “Contour  Evolution,  Neigh¬ 
borhood  Deformation  and  Global  image  Flow:  Pla¬ 
nar  Surfaces  in  Motion,*  CAR-TR-58,  Comp.  Vision 
Laboratory,  Center  for  Automation  Research,  Univ. 
cf  Maryland,  College  Park,  MD,  April  1984. 


r  'I 


387 


ANALYSIS  OP  AN  ALGORITHM  FOR  DETECTION  OP  TRANSLATIONAL  MOTION 


Igor  Pavlin,  Edward  Rifleman  and  Allen  Hanson 

Computer  and  Information  Sciences  Department 
University  of  Massachusetts,  Amherst,  MA  0100? 


ABSTRACT 

This  report  presents  an  extensive  testing  of  an  al¬ 
gorithm  for  the  recovery  of  translational  motion  parameters 
of  a  sensor  moving  through  a  static  environment.  The  algo¬ 
rithm  has  been  evaluated  using  synthetic  images  in  terms 
of  the  number  of  feature  points  matched  between  frames, 
the  relative  angle  between  camera  orientation  and  direc¬ 
tion  of  translation,  uncorreiaiea  .\sd  correlated  noise,  and 
computational  cost. 

The  algorithm  appears  to  be  robust  across  a  very 
wide  range  of  camera  translations,  using  only  as  few  as  8 
feature  points.  When  the  angle  between  the  direction  of 
translation  and  the  direction  of  view  is  in  certain  range  of 
angles  the  w  irithm  experiences  difficulties. 

in  addition,  an  improvement  in  the  speed  and  possi¬ 
bly  the  accuracy  of  the  search  is  suggested.  By  the  reason¬ 
able  assumption  of  smoothness  in  the  error  surface,  many 
stages  of  iterative  search  may  be  avoided. 

1.  INTRODUCTION 

This  paper  describes  results  of  an  extensive  testing 
of  the  algorithm  developed  by  D.  Lawton  [LAW84]  for  re¬ 
covery  of  translational  motion  parameters.  This  algorithm 
avoids  the  computation  of  an  image  displacement  field  prior 
to  recovery  of  the  parameters  for  the  axis  of  translational 
motion.  The  axis  of  translation  may  also  be  specified  by 
the  point  where  it  intersects  the  image  plane,  called  the 
focus  of  expansion/contraction  (abbreviated  to  FOE/C). 

The  global  nature  of  the  technique  arises  from  the 
use  of  many  'interesting*  or  distinguishable  points  spread 
over  an  image.  Despite  problems  of  noise,  false  feature 
matches,  occlusion  of  features,  and  other  causes  of  unreli¬ 
able  information  in  some  parts  of  the  image,  the  solutions 
obtained  can  be  quite  stable.  The  robustness  and  accuracy 
are  a  consequence  of  the  algorithm’s  global  and  local  char¬ 
acteristics:  a  search  for  the  minimum  of  the  global  erro* 
associated  with  a  ret  of  points  whose  motion  is  jointly  con¬ 
strained,  and  local  correlation  measurements  to  find  the 


This  work  was  ,apported  by  the  Defease  Advaaced  Research 
Project#  Ajeney,  DARPA  trader  grant  NOOOW-S2-K-OM4 


best  match  of  each  point  contrasting  to  the  global  er¬ 
ror.  Intuitively,  the  global  natu.e  of  the  search  for  the 
correct  position  of  the  FOE/C  is  manifested  in  the  many 
■votes*  of  different  displacement  vectors  for  the  position  of 
the  FOE/C,  and  consequently  the  algorithm  demonstrates 
a  high  degree  of  robustness  and  accuracy. 

This  paper  consideree  only  the  case  of  translational 
motion.  However,  there  are  two  closely  related  cases  which 
we  believe  will  yield  to  a  similar  approach  and  results.  For 
example,  translation  of  the  camera  constrained  to  a  known 
plane,  with  simultaneous  rotation  about  the  axis  perpen¬ 
dicular  to  that  plane,  is  a  two-dimensional  problem.  Sim¬ 
ilarly,  pure  eensor  rotation  about  a  fixed  axis  is  described 
with  only  three  parameters  (two  of  them  specifying  the  axis 
of  rotation  and  one  the  extent  of  the  rotation).  We  did  not 
consider  these  caret  of  constrained  motion  in  this  paper, 
but  we  believe  thrir  ix  business  will  be  similar,  because  the 
same  type  of  global  constraints  on  the  local  feature  matches 
is  available. 

Evaluation  of  performance  has  been  carried  out  be 
examinL^  the  influence  of  the  following  parameters: 

1.  number  of  feature  points  in  the  image  plana  which 
are  matched  bitween  two  frames. 

2.  the  accuracy  with  which  the  direction  of  translation 
can  be  recovered  as  a  function  of  the  relative  orien¬ 
tation  of  the  camera  line  of  eight  to  the  direction  of 
motion. 

3.  resolution  of  the  sampling  of  the  FOE/C  during  the 
search. 

4.  site  of  the  window  used  for  feature  co -relation  be¬ 
tween  frames. 

5.  resolution  of  the  correlation  matching  for  each  fea¬ 
ture  (i.e.,  step  sise  of  feature  displacements  for  in¬ 
terpolation  and  matching). 

0.  the  efficiency  of  computation,  and  the  robustnese  of 
the  method  with  respect  to  the  correlated  and  un¬ 
correlated  noise. 

The  motion  sequence  used  in  our  experiments  is  cre¬ 
ated  using  a  compute;  graphics  system  which  incorporates 


ray-tracing  technique*  [WHI80].  The  sequence  is  a  rea¬ 
sonable  substitute  for  'eal- world  images  since  light  is  han¬ 
dled  somewhat  realistically  (shadows,  specular  reflection, 
decrease  of  the  light  intensity  with  the  distance,  and  the 
physical  laws  of  reflection  and  refraction  are  incorpnated). 
The  advantage  of  synthetic  images  is  that  they  a'iv  -  the 
experimental  situation  to  be  controlled  with  an  accurate 
model  of  the  camera  motion  in  an  environment.  Evalu¬ 
ation  of  the  actual  performance  of  the  algorithm  on  real 
world  images  will  be  performed  as  data  bases  of  motion 
sequences  become  available. 

As  we  have  mentioned,  one  of  the  issues  associated 
with  th«  algorithm  that  we  have  addressed  in  this  paper  is 
the  efficiency  of  the  search  for  the  FOE/C,  which  involves 
a  global  (sparse)  sampling  of  the  error  surface  followed  by 
a  finer-resolution  ocal  hill-climbing  search  for  the  global 
minimum  of  the  error  surface.  There  may  be  unnecessary 
inefficiencies  in  this  search  algorithm.  Also,  when  the  error 
surface  around  the  minimum  becomes  relatively  flat  and  ex¬ 
hibits  local  small  fluctuations,  the  local  hill-climbing  tech¬ 
nique  might  iaiL 

We  suggest  an  impro  vement  of  the  algorithm  by  in¬ 
troducing  a  smoothness  constraint  on  the  error  surface. 
This  approach  is  based  upon  strong  intuitions  about  wny 
the  error  surface  should  be  smooth  if  more  than  a  few  points 
are  tracked  between  a  pair  of  frames.  The  global  assump¬ 
tion  of  smoothness  allows  the  use  of  a  surface  fitting  al¬ 
gorithm  to  speed  up  the  search  for  the  FOS/C,  while  not 
causing  the  performance  to  deteriorate  significantly.  The 
approach  may  also  allow  the  search  for  the  minimum  to 
be  mors  reliable  by  eliminating  the  local  hill-climbing  tech¬ 
nique. 

2.  BACKGROUND 

In  this  section  we  present  a  brief  review  of  the  Law- 
ton  algorithm  and  the  concepts  concerning  translational 
rot-on  that  are  necessary  to  follow  the  discussion  of  the 
experiments  In  the  following  section.  The  rerder  is  encour¬ 
aged  to  consult  [LAW83,  LaWS-I)  for  more  details. 

2.1  Translational  Mottos  and  Displacement  Fields 

A  displacement  field  is  defined  as  a  vector  field  pro¬ 
duced  by  the  chang  s  in  the  position  of  the  images  of  en¬ 
vironmental  points  cvertim-  |Gn>.->0].  Li  the  case  cf  pure 
translational  motion  of  the  camera  in  a  static  environment, 
the  intersection  of  the  image  plane  and  the  vector  describ¬ 
ing  the  translations!  motion  of  the  focal  point  of  the  camera 
is  called  the  FOE/'C.  The  displacement  vector  of  any  im?ge 
feature  lies  on  the  radii  line  connecting  the  FOE/C  to  the 
frature,  see  Figure  1. 

As  few  as  two  image  point*  would  be  sufficient  to 
identify  the  correct  translational  motion.  If  the  displace¬ 
ment  of  two  points  that  are  not  eoUincav  could  be  accu- 


The  camera  model  aad  the  displacement  field  pro 
diced  bjr  two  esviroameatai  point*  A  aad  B  dariaf 
the  motloa  efthe  carnet  a  specified  bjr  the  translational 
vector  t. 

Flgsra  1:  Camera  model  and  FOE/C. 

rately  tracked  from  one  frame  to  another,  their  intersection 
in  the  image  would  determine  the  Vf'E/C,  which  in  turn 
unambiguously  specifies  the  direction  of  translation. 

In  practice,  however,  establishing  the  proper  corre¬ 
spondence  between  the  parts  of  images  in  two  or  more  con¬ 
secutive  frames  is  difficult.  Correspondence  between  simi¬ 
lar  parts  of  two  images  is  established  using  techniques  such 
as  a  scalar  valued  correlation  function  or  symbolic  (token) 
matching  cf  k*y  image  events.  However,  there  are  a  vari¬ 
ety  of  problems  that  make  such  mechanism*  unreliable:  the 
aise  of  the  local  area  to  be  searched,  presence  of  noise  in 
th  ’  image,  occlusion  of  surfaces  that  previously  were  visi¬ 
ble,  insufficient  resolution  of  the  image,  etc. 

Redundancy  and  global  constraints  on  the  feature 
dynamics  across  frames  '.an  be  used  to  overcome  these  dif¬ 
ficulties.  Since  the  d:splv«ment  vector*  of  all  features  in 
a  static  environment  are  constrained  to  emanate  from  the 
FOF-/C,  the  use  of  additional  features  should  more  accu¬ 
rately  constrain  the  correct  position  of  the  FOE/C.  Use  of 
redundant  features  would  compensate  for  the  ’noise*  in¬ 
troduced  by  these  features  that  provide  weak  or  incorrect 
information  (e.g.,  features  in  low  contrast  or  homogeneous 
regions,  features  at  occlusion  boundaries).  However,  the 
use  of  a  large  number  of  features  is  undesirable  from  the 
point  of  view  of  computational  efficiency. 

2.2  Error  function  for  FOE/C  search 

The  position  of  the  FOE/C  can  be  obtained  in  two 
basic  step*.  The  first  step  is  the  extra*-  ion  of  features,  and 


189 


the  second  step  is  the  search  for  the  position  of  the  FOE/C. 
The  whole  search  procedure  is  symbolised  in  Figure  i. 

The  feature  extraction  process  is  responsible  for  the 
extraction  of  distinguishing  points  which  can  be  uracked 
from  one- frame  to  another.  Contour  points  of  high  cur¬ 
vature  are  a  good  choice  because  they  are  less  likrly  to 
produce  ambiguous  matches  in  succeeding  frames.  The 
contours  can  be  extracted  using  a  variety  of  techniques: 
thresholding,  sero-crossings,  boundary  curvature  measures, 
local  contrast  measurements,  sharpness  of  the  autocorrela¬ 
tion  •aict'-a,  etc. 

The  direction  of  translation  is  then  found  by  a  rearch 
across  hypothesised  FOE/C  positions  in  the  Image  plane. 
For  each  hypothesised  position  of  the  FOE/C  and  for  each 
feature  ii>  the  image  the  best  correlation  between  frames 
is  found  along  the  radial  line  connecting  the  hypothesised 


The  poeitios  of  the  FOE/C  is  the  image  pUse  *»  hy¬ 
pothesized.  The  VOE/C  itself  b  at  the  iatrnectioa  if 
the  image  plaae  and  the  translation  vector 4*.  The  poei- 
tisn  of  the  FOE/Cabo  uniquely  determine*  n  point  on 
the  unit  sphere  rhere  the  translation  vector  6  pierce* 
the  unit  sphere.  At  that  polrt,  the  error  associated 
with  the  mismatch  of  the  features  A,  Ai  and  B,  B,  b 
calculated.  For  a  different  FOE/C  the  beet  match  for 
features  A  and  B  might  be  tonnd  along  n  radial  path 
at  the  positioaa  At  and  B%.  The  error  at  that  FCE/C 
b  expected  to  be  almost  always  higher  when  the  posi¬ 
tion  of  the  assumed  FOE/C  b  significantly  is.  TTect. 
This  process  b  repeated  for  many  hypothesised  posi¬ 
tions  cf  the  F OE /C  resulting  in  the  error  surface  on  the 
unit  sphere.  The  position  of  the  minimum  cf  the  error 
surface  determines  the  selection  for  the  correct  axb  of 
translation  (or,  equivalently,  the  correct  FOE/C). 

Figure  2:  Search  for  the  FOE/C. 


FOE/C  and  the  feature.  If  a  feature,  nay  A,  has  moved  in 
the  subsequent  image  to  the  poeition  At  some  distance  d 
along  the  radial  line,  then  a  subimage  around  A  is  expected 


to  correlate  nearly  perfectly  exactly  d  units  along  the  radial 
line  in  the  next  frame.  An  equivalent  representation  oi  the 
information  is  given  as  an  error  measure  between  features 
A  and 

trrori^tm.A  =  1  -  corr[feat\ireA,  feaiureAl). 

Several  measures  were  examined  by  Lawton  for  fea¬ 
ture  matching:  the  normalised  correlation,  the  Moravec 
correlation  (which  was  used  in  this  paper)  [MOR77],  and 
the  normalised  absolute  value  difference.  These  correla¬ 
tion  functions  differ  in  terms  of  speed  and  precision.  The 
speed  increases  and  the  accuracy  decreases  in.  the  order 
listed  above.  The  sixe  of  the  n  x  n  correlation  window  can 
affect  accuracy  (in  this  paper  the  sise  is  varied  from  3  to 
U).  In  addition,  one  must  consider  the  slterasthra  forms 
of  interpolation,  because  subimages  from  one  frame  are  dis¬ 
placed  to  positions  in  the  next  frame  that  are  not  aligned 
with  pixels;  hen  we  use  bilinear  Interpolation. 

As  an  alternative,  we  an  currently  examining  the 
use  of  symbolic  features,  or  ’tokens*,  i.e.,  interesting  points 
with  a  set  of  attribute-value  pairs,  for  the  use  in  symbolic 
matching.  The  replacement  of  correlation  matching  with 
symbolic  catching  has  the  potential  of  significantly  increas¬ 
ing  the  speed  of  computation1,  but  it  was  not  used  here. 

If  the  position  of  the  hypothesised  FOE/C  is  very 
dose  to  the  correct  FOE/C  I  hen  most  of  the  features  will 
return  good  matches  and  small  errors,  resulting  in  a  very 
small  total  error  for  that  particulai  position  of  the  FOE/C. 
If  the  position  of  the  hypothesised  FOE/C  is  far  from  the 
correct  poeition,  the  computed  total  error  will  be  signifi¬ 
cant,  since  many  features  will  return  a  poor  match.  Pro¬ 
ceeding  in  this  fashion  an  error  surface  can  be  constructed 
over  the  image  plane,  with  the  minimal  error  expected  at 
the  position  of  the  correct  FOE/C.  To  achieve  a  more  uni¬ 
form  sampling  of  the  hypothesised  positions  of  the  FOE/C, 
it  is  much  more  convenient  to  represent  the  orientation  of 
the  axis  of  translation  and  associate^  error  values  on  the 
unit  sphere;  this  is  accomplished  by  projecting  error  surface 
values  onto  the  unit  sphere  centered  at  the  camera  focus. 

The  error  surface  on  the  unit  sphere  can  be  con¬ 
stricted  starting  with  a  coarse  sampling  over  polar  angles. 
In  the  experiments  that  loUovy,  the  hypothesised  position 
of  the  FOE/C  on  the  unit  sphne  is  specified  by  the  angles 
a  and  (j.  where  a  is  the  angle  between  the  projection  of 
the  translational  axis  on  the  Y-Z  plane  and  the  s-axis,  and 
P  is  the  angle  between  tK  projection  of  the  translational 
axis  on  the  X-Z  plane  ate  ’bv  s-axis.  The  resolution  of 
the  sampling  is  one  factor  determining  the  precision  of  the 
method  and  speed  of  the  search.  The  grid  spacing  for  sam¬ 
pling  in  the  (a,0)  coordinate  system  was  roughly  45,  22.5, 
and  11.25  degree#  in  our  experiments. 

Once  ih*  coarse  sampled  error  surface  is  found,  the 


1  Lswton,  private  eoaunnnitstion 


o 


•  • 


p 


* 


all  experiments  the  displacement  of  the  camera  was  calcu- 
lated  so  that  the  maximal  displacement  of  the  nearest  point 
on  tie  checkerboard  was  about  7  pixels  and  the  maximal 
displacement  of  the  farthest  point  on  the  checkerboard  was 
about  4  pixels. 

First,  a  set  of  translations  in  the  Y-Z  plane  is  gener¬ 
ated,  starting  from  the  line  of  sight  (s-axis),  and  increasing 
a  by  increments  of  IS*  from  0*  to  00*,  and  then  by  incre¬ 
ments  of  30*  to  180*  (keeping  fi  -  0).  The  translations 
parallel  to  the  image  plane  in  the  directions  along  the  x- 
axis  and  the  bisector  of  the  first  X-Y  quadrant  are  also 
analysed,  as  well  as  the  translation  along  an  ‘arbitrary* 
direction  specified  by  the  angles  =  (SO*,  30*). 

Parameters  are  wrM  from  a  default  set  of  param¬ 
eters.  The  defaults  are:  correlation  window  displacements 
of*  jwI  (with  a  maximum  elhwad  displacement  of  10  pix- 
els),  window  rise  of  7x7,  sampling  in  polar  coordinates  of 
roughly  every  4$*,  and  local  search  precision  of  =  0.005 
radians.  Tables  1  through  6,  given  in  the  Appendix,  present 
the  results  for  images  without  noise.  Table  7  presents  re¬ 
sults  when  one  of  the  frames  is  corrupted  with  oncorreUted 
noise  of  varying  strength.  Table  C  presents  rasuhs  when  one 
of  the  frames  in  corrupted  with  correlated  noise  of  varying 
strength. 

The  number  of  samples  for  the  FOE/C  daring  the 
entire  search  (combined  global  and  local)  was  about  60, 160 
and  37S  (Tables  1,  3,  and  3,  respectively).  Them  am  ap¬ 
proximately  300, 000  opermiioes/FOE/foaium,  bringing  the 
total  computation  time  to  roughly  1  second  per  feature  per 
position  of  the  POE/C  (on  the  VAX  11-780).  The  global 
search  for  the  minimum  of  the  error  surface  is  the  time- 
consuming  part  of  tbs  search.  For  example,  in  experiments 
given  in  Table  3,  they  required  almost  00%  of  the  compu¬ 
tation  time).  We  would  like  to  stress  hem,  however,  that 
the  algorithm  and  the  environment  is  which  it  was  run  were 
designed  for  programming  flexibility  in  an  experimental  de¬ 
velopment  process,  and  not  in  any  way  optimised  for  speed. 
Thus,  the  computational  speed  can  be  greatly  improved. 

3.2  Discussion  a t  Experiment*]  Results 

Before  we  consider  tee  results  of  individual  experi¬ 
ments,  several  general  rta  arks  about  the  shape  of  the  error 
surface  for  a  camera  moving  in  a  static  environment  am  in 
order 

1.  The  greater  the  number  of  the  features,  the 
“smoother*  the  erTur  surface  is  expected  to  be. 

3.  The  use  of  more  features  increases  the  computation 
time  proportionally. 

3.  With  fewer  feature*,  the  percent  of  the  contribution 
of  one  feature  to  the  total  error  value  is  greater,  and 
if  some  features  are  "weak*,  or  if  their  number  is 
small,  ths  error  surface  is  rougher. 


4.  The  closer  the  feature  is  to  the  FOE/C,  ths  km  re¬ 
liable  is  its  costribvihja. 

5.  Feature*  further  from  the  FOE/C  usually  have  larger 
displacements  (although  displacement  is  also  a  func¬ 
tion  of  >.h*  depth  of  the  environmental  point)  ard 
therefore  predict  mom  accurately  the  orientation  of 
the  radial  line  on  which  the  FOE/C  should  lie. 

6.  Since  the  position  of  the  FOE/C  is  unknown,  Um 
features  should  be  spaced  mom  or  lees  uniformly 
throughout  the  image  plane. 

Let  ns  consider  for  a  moment  the  difficulties  with  the 
case  of  translational  motion  parallel  (or  almost  parallel)  to 
the  image  plan*  (Le^  the  FOE/C  in  m  80*  to  the  direction 
ot  the  fine  of  sight),  which  requires  a  highly  mnrithr*  search 
in  order  to  find  the  minhnnm  of  global  error.  Is  this  case 
the  FOE/C  is  far  away  from  ths  center  of  the  huge  plane 
(Figure  4). 

Assume  a  window  of  sis*  w  w  i*  centered  at  (*,y) 
and  moved  a  distance  .!  toward!  the  FOC  (analysis  is  simi¬ 
lar  for  ths  FOE).  H  ths  distance  of  the  FOC  is  n  D,  n  >  1, 
whom  3D  3D  ■  the  sine  of  the  imagt  plane,  then  according 
to  Figure  4  we  have: 

4./4  -  */(n  •  D  +  y)  m  */n  •  D 

or  for  *  m  D, 

d,  et  d/a. 


IMAM  AM 


•  t  ' 

lit 

Whes  the  FOC  is  e  lias  Cat  tear*  toe  svajr  trots 
the  muff  piaae,  a  dieplsermest  at  s  fester*  d  smite 
tov still  the  FOC  rum  the  latent  diapiaceoteat  of 
the  (retire  at  toaghiy  t,  w  d/m. 

Figure  4i  Search  far  e  distent  FOC. 


392 


For  example,  if  the  FOC  correspond*  to  the  translation  of 
75*  or  85*  away  from  the  line  of  tight,  then  n  m  4  and 
n  »  10,  respectively.  From  that  equation*  we  can  *ee  that 
the  correlation  function  meet  be  able  to  detect  a  change  of 
ft-a'ure  position  in  the  i-dinctio*  of  the  order  of  d,.  How¬ 
ever,  d,  is  expected  to  be  very  small  when  the  position  of 
the  FOB/C  is  far  from  the  image  center,  since  the  sise  of 
the  displacement  d  is  limited,  a.  best,  by  *he  image  use. 
More  importantly,  the  displacement  is  usually  assumed  to 
be  to  avoid  ambiguities  in  the  matching  process.  If 
the  correlation  function  is  not  able  to  detect  changes  of  site 
d,  in  the  displacement  of  a  feature  in  the  lateral  direction, 
then  the  position  of  the  FOB/C  is  unknown  with  an  angu¬ 
lar  uncertainty  of  dg/d.  When  the  FOE/C  is  closer  to  the 
center  of  the  image,  the  angular  reeol  '.tion  of  the  method 
is  also  limited  by  the  ability  of  the  correlation  function  to 
detect  such  small  lateral  differences  daring  feature  displace¬ 
ments,  but  the  relative  error  in  the  position  of  the  FOB/C 
is  much  smaller. 

The  sersitr  ity  of  the  correlation  function  also  de¬ 
pends  on  the  sise  of  -be  window  and  the  type  of  the  corre¬ 
lation  function.  It  i*  dear  that  the  averaging  nature  of  cor¬ 
relation  function  (being  a  sum  of  products)  works  against 
its  sensitivity.  Thus,  windows  that  are  larger  do  sot  neces¬ 
sarily  imply  better  result*. 

The  tables  presented  in  the  Appendix  are  consid¬ 
ered  next.  Is  these  tables,  the  second  column  represents 
the  directions  (in  poly  coordinates)  in  which  the  casern  is 
actually  translated  and  hence  are  the  values  which  the  algo¬ 
rithm  is  supposed  to  recover.  We  refer  «o  these  directions 
as  ‘correct*  values.  The  third,  fourth  and  fifth  columns 
specify  deviations  (in  degrees)  from  the  to/roet  vatu'*  farm 
experimental  runs  for  4,  8  and  16  feature  points,  respec¬ 
tively. 

In  some  case*  the  local  search  fails  and  is  not  possible 
to  recover  the  correct  axis  of  translation.  These  cases  are 
marked  with  an  A  (for  ambiguous)  in  the  tables.  Table*  1 
through  3  show  that  the  ambiguous  result*  appear  mostly 
for  motions  almost  perpendicular  to  tbs  lint  of  sight,  where 
small  fluctuations  in  the  error  surface  are  most  probably 
due  to  ths  decreasing  sensitivity  of  the  correlation  function 
to  small  displacements.  In  thee*  cases,  the  actual  value 
returned  by  the  algorithm  was  the  coarse  grid  sampling 
point  with  the  smallest  error.  Because  of  tf>e  noisy  nature 
of  the  error  surface,  the  finer  grained  local  search  cannot 
utilise  the  assumption  of  the  convexity  of  the  error  surface 
in  order  to  find  the  total  minimum.  Rather,  the  loed  search 
leturns  a  value  close  to  the  value  returned  by  the  global 
search.  Note  that  dec  res  ">ug  rparsenees  of  the  global  aearcii 
results  in  a  smaller  number  oi  ambiguous  values  (compare 
Tables  1,  2,  3).  Thus,  failure  to  recover  the  comet  axis  is 
indeed  due  to  the  local  part  of  the  search,  and  therefore  the 
results  are  not  really  a  failure  of  the  FOE/C  methed.  (That 
is  why  we  label  them  ii  ambiguous  and  not  as  errors.) 


It  ir  dear  that  16  points  reem  to  give  adequately 
reliable  result*  (see  Table  3),  with  errors  of  only  a  few  de¬ 
grees  from  the  correct  axis.  It  is  plausible  that  32  or  64 
feature  point*  would  improve  the  precision  of  the  search 
for  th*  FOE/C.  In  most  tables  results  shown  in  one  row 
suggest  the  improvement  of  accuracy  as  the  cumber  of  fea¬ 
tures  is  increased.  For  four  feature  points  the  results  are 
not  accurate,  but  are  relatively  close  to  comet  results,  es¬ 
pecially  when  camera  translation  is  along  the  line  of  sight. 
In  real-world  image*,  this  probably  would  not  be  asufficieut 
number  of  features. 

A  finer  division  of  the  window  displacement*  (sam¬ 
pling  every  third  of  a  pixel;  Table  4)  doe*  not  produce  much 
improvement  with  respect  to  the  default  set  of  parameters, 
demonstrating  that  the  correlation  measure  between  two 
windows  has  a  relatively  broad  peak  at  the  maximum. 

Surprisingly,  a  window  of  smaller  sise  (3x3)  gave 
uite  satisfactory  results  and  in  some  cases  recovered  the 
axle  of  translation  when  larger  windows  failed  to  do  so 
(compare  Tables  1  and  5).  Since  the  computatior  time 
with  smaller  windows  is  smaller,  this  window  sise  might  be 
used  during  the  initia:  stage*  o'  computation.  A  window  of 
sise  11x11  did  not  show  result  any  better  than  the  default 
window  sise  oil  xl. 

Supplying  an  initial  guess  does  to  the  correct  axis  of 
translation  (Table  6)  recovered  the  correct  value  in  almost 
all  cases.  T**e  search  Is  don*  locally  around  the  initial  axis 
in  a  neighborhood  6*  .The  significance  of  this  result  is  in 
showing  that  the  correlation  measure  is  oKs  to  detect  the 
changes  of  axial  positions  to  within  a  few  degrees.  However, 
one  should  not  be  misled  by  the  ‘correctness*  of  the  results. 
Since  we  have  limited  our  eearch  space  here  only  to  a  very 
small  S»  neighborhood  around  ore  of  the  points  on  the  error 
surface,  any  returned  value  for  the  minimum  will  he  not 
more  than  ±2i»  away  from  the  initial  guess.  This  estimate 
is  a  consequence  of  the  fact  that  each  time  a  new  local 
minimum  is  found,  the  new  area  in  which  the  search  is 
continued  has  half  the  diameter  of  the  old  one  (St  +  St/2  + 
St/ 4+..  =  2-5o).  Thus,  the  primary  problem  is  to  determine 
the  cor.ect  neighborhood  for  local  search  and  therefore  the 
global  search  must  have  sufficient  resolution. 

Table  7  represents  the  result*  of  the  search  for  the 
cor.ect  translational  axis  when  one  of  the  images  was  cor¬ 
rupted  with  the  uniform  white  noise.  The  corrupted  image 
is  shown  in  the  Figure  5. 

The  program  parameters  were  the  same  as  those  of 
Table  3,  which  was  judged  to  represent  the  best  results, 
in  the  set  oi  experiments.  The  number  of  features  was  16, 
therefore  the  last  column  of  the  Table  3  should  be  compared 
with  the  results  in  Tables  7  and  8.  Noise  was  uniformly 
distributed,  with  an  intensity  range  ot  0  to  ICO,  a  mean  of 
SO,  and  a  standard  deviation  of  30.  The  gray  level  value* 
of  the  original  image  rang-  d  from  a  minimum  of  -88  to  a 


393 


Thb  bnafe  b  obtained  try  addiaf  white  nocae  to  the 
Imafe  ahowa  in  Fifue  J. 


Flgnn  Ss  Synthetic  image  used  la  experiments 
with  added  white  noiaa. 


maximum  of  80,  with  an  average  of  -54.0  and  a  standard 
deviation  of  17.0,  The  intensity  range  of  the  noise  added 
to  this  image  was  approximately  6%,  12%,  30%  and  60%  of 
the  intensity  range  of  the  uncorrupted  image.  The  results 
show  the  stability  of  the  search  in  the  presence  of  noise, 
with  a  graceful  degradation  of  performance  as  the  noise 
levels  increase. 

Finally,  in  Table  8  we  present  the  reaults  of  runs  on 
images  with  correlated  noise.  The  correlated  noise  was  ob¬ 
tained  by  averaging  the  white  noise  plane  over  a  5x5  neigh¬ 
borhood.  This  decreased  the  range  of  noise  intensities  to  24 
to  72  (approximately  a  half  of  the  white  noise  range)  with 
a  mean  of  48.85  and  a  standard  deviation  of  5.8  (noticeably 
smaller  than  the  standard  deviation  of  the  white  noise;  see 
Figure  8). 

Tables  7  and  8  suggest  that  the  abwithni  is  robust 
in  the  presence  of  noise.  We  are  further  exploring  the  rela¬ 
tionship  between  noise  levels,  error  values  returned  by  the 
correlation  function,  and  the  overall  performance  of  the  al¬ 
gorithm. 

4.  FURTHER  WORK 

This  section  describes  some  possible  improvements 
to  Lawton’s  algorithm  and  discusses  search  for  the  FOE/C 
from  the  point  of  view  of  a  regulariiation  problem  (POG84]. 

In  many  experiments  done  by  Lawton  the  error  sur¬ 
face  is  reported  to  exhibit  an  overall  *  smooth*  behavior. 
This  is  to  be  expected  for  the  following  reasons.  The  cor¬ 


relation  function  between  the  two  features  is  a  relatively 
smooth  function  of  the  feature  displacement  (provided  that 
the  displacement  is  small,  that  there  is  no  occlusion,  and  no 
dramatic  local  changes  take  effort).  The  total  error  func¬ 
tion  which  is  a  sum  of  many  smooth  functions,  should  also 
be  a  smooth  function.  The  assumption  of  slow  temporal 
changes  of  direction  of  translation,  often  used  in  this  kind 
of  experiments,  also  implies  a  slow  change  of  the  position 
of  the  minimum  of  the  total  error  on  the  image  plane. 

We  have  suggested  that  in  some  cases  ths  error  sur¬ 
face  may  not,  in  fact,  be  smooth.  The  hypothesis  is  that 
the  local  search  may  fail  to  return  the  correct  location  of 
the  FOE/C  because  it  has  found  a  fake  minima  due  to  the 
presence  of  noise  in  the  error  surface.  We  are  currently  in¬ 
vestigating  this  problem,  and  if  the  hypothesis  proves  to  bs 
correct,  we  intend  to  explore  it  using  the  notion  of  ’regular- 
iration*.  This  idea  n  perhaps  best  expressed  in  the  work  by 
Poggio  and  Tor. a  [FOG84],  although  it  is  found  in  many 
early  vision  algorithms  [GRI81,  TER84|.  In  general,  the 
theory  assumes  that  many  problems  Lu  early  vision  are  of 
an  ’inverse*,  *  ill-posed*  type.  Roughly  speaking,  ill-posed 
problems  do  not  provide  a  unique  eolation  and  the  space 
of  acceptable  eolations  has  to  be  restricted  by  choosing  an 
appropriate  ’stabilisation*  functional 

Our  hypothesis,  then,  is  that  the  problems  that  led 
to  the  iailure  to  recover  the  correct  translational  axis  are 
exactly  due  to  the  ill-poeed  nature  of  the  problem  and  that 
a  regularisatioL  functional  is  needed  to  guide  the  search  for 
the  minimum  of  the  error  surface.  The  problem  with  the 
recovery  of  the  correct  translational  axis  can  be  partially 


This  imafe  t»  obtained  by  addiaf  correlated  aotoe  to 
tbe  imafe  ihowa  ia  Flfore  9. 

Figure  6:  Synthetic  image  used  is  experiments 
with  added  correlated  noise. 


394 


overcome  by  imposing  a  smoothness  constraint  on  the  er¬ 
ror  surface.  By  forcing  the  error  surface  to  be  smooth  small 
fluctuations  in  the  error  surface  (due  to  the  various  prob¬ 
lems  discussed  in  this  paper)  should  be  eliminated  and  the 
search  for  the  minimum  facilitated  by  the  smoother  ’valley* 
around  the  minimum.  The  information  about  the  position 
of  the  minimum  will  be  supported  by  the  overall  shape  of 
the  surface  in  this  area,  and  therefore  it  will  reflect  the 
contribution  of  many  points  on  the  error  surface. 

The  other  advantage  of  this  approach  is  that  the  tit¬ 
ling  procedure  can  speed  up  the  search  process.  Due  to 
the  smoothness  assumption,  one  might  start  with  a  rather 
sparse  set  of  the  error  values  (FOE/C  positions)  and  in¬ 
terpolate  the  values  in  the  search  for  the  global  minimum 
The  search  can  then  be  localised  and  eventually  repeated 
with  a  finer  grid  in  a  srr-Jler  region  where  the  fitted  sur¬ 
face  has  the  minimum.  Selection  of  the  initial  number  of 
points  in  the  sparse  set  of  data  requires  compromise  be¬ 
tween  the  validity  of  the  surface  shape  and  the  effc:ency  o ' 
the  method. 

6.  SUMMARY 

The  report  describes  a  set  of  experimental  runs  de¬ 
signed  to  determine  the  performance  of  a  motion  algorithm 
for  detection  of  translational  motion.  The  performance  of 
the  algorithm  has  been  tested  trv  various  translational  di¬ 
rections  on  a  pair  of  synthetic  images.  The  o~.rail  perfor¬ 
mance  of  the  algorithm  for  cases  where  the  camera  notion 
is  within  45*  of  the  direction  of  the  line  of  sight  was  very 
good:  robust  with  respect  to  noise  and  accurate.  The  al¬ 
gorithm  can  return  the  translational  axis  to  within  a  few 
degrees  of  the  correct  axis  even  if  the  images  an  subject  to 
significant  amounts  of  noise. 

It  was  found  that  the  algorithm  is  not  adequately 
sensitive  to  translational  almost  normal  to  the  line  of  sight. 
The  problem  was  found  to  be  not  only  one  of  insensitiv¬ 
ity  of  the  correlation  function,  bat  also  one  caused  by  the 
failure  of  the  local  search  for  the  poeeible  positions  of  the 
FOE/C  in  the  case  that  the  error  surface  around  the  mini¬ 
mum  exhibits  small  fluctuations. 

We  have  suggested  an  approach  which  would  im¬ 
pose  a  constraint  of  the  smoothness  of  the  error  surface. 
In  a  forthcoming  paper  ws  w:Il  demonstrate  that  this  ap¬ 
proach  increases  both  speed  and  reliability  of  the  search 
for  the  correct  translational  axis  in  cases  where  the  tested 
algorithm  had  difficulties. 

ACKNOWLEDGMENT 

We  would  like  to  thank  those  people  who  contributed  their 
t.me  and  effort  to  both  the  intelectual  and  system  support 
environment  within  which  this  research  was  dons,  espe¬ 
cially  Daryl  Lawton,  Gilad  Adiv,  P.  Anandan,  Bob  Heller, 
Michael  Boldt,  Brian  Burns,  and  Seraj  Bharwani. 


APPENDIX 

In  this  appendix  a  summary  of  runs  is  prese"  **•'.. 
The  set  of  default  values  assumed  for  the  parameters  in  -  ue 
tested  algorithm  are  as  follows  (for  details  see  Section  2.3 
and  Section  3): 

1.  The  step  sise  (in  pixels)  of  the  correlation  matching 
of  each  feature  is  one  pixel. 

2.  The  maximum  distance  a  feature  is  moved  in  (he 
search  for  the  best  correlation  is  10  pixels. 

3.  The  sise  of  the  window  is  7x7  pixels. 

4.  The  precision  with  which  the  translational  axis  was 
determined  in  the  (a.  3)  space  is  Sm  m.  0.005  radians. 

5.  The  density  of  the  initial  global  search  for  the  mini¬ 
mum  of  the  error  surface  is  roughly  45*.  The  number 
of  stepe  in  local  search  is  determined  by  the  sise  of 
5.. 

The  tables  summarise  the  results  obtained  from  mul¬ 
tiple  runt  of  the  algorithm  under  varying  conditions.  Tables 
1  through  6  art  images  without  noise,  Table  7  presents  rt- 
r-lts  with  uncorrelated  uniform  white  noise,  and  Table  8 
presents  results  with  correlated  noise.  The  following  infor¬ 
mation  is  relevant  for  all  the  tables: 

•  The  parameter  that  is  changed  from  the  default 
vali«,  indicated  above,  is  specified  in  the  title  of  the 
table. 

•  The  second  column  of  th<  table*  represents  correct 
values  for  the  direction  of  nation,  where  ir  is  the 
angle  between  the  projection  of  the  translational  axis 
on  the  Y-Z  plans  and  the  t-axis.  p  is  the  angle 
b  itween  the  projection  of  the  translational  axis  on 
the  X-Z  plan*  and  the  s-axis. 

•  Other  columns  give  experimental  .-esult*  (in  degrees) 
for  deviation  of  angles  a  and  p  from  the  correct  val¬ 
ues. 

•  Symbol  A  stands  for  the  "ambiguity*  in  the  search 
for  the  right  axis,  a  phenomenon  discussed  in  Section 
3.2. 

•  At  the  end  of  table  a  typical  amount  of  CPU  time 
on  VAX  11-780  is  given.  This  time  represent*  time 
spent  in  both  global  and  local  search. 


395 


Tibia  1 


Tibia  S 


('/ 


£ 


•j 


a 


Correct  ind  experimental  valuea  for  the  translational  axis, 
using  the  initial  (default)  set  of  parameters  in  the  algo¬ 
rithm.  (Global  sampling  approximately  every  45*.) 


Exp. 

No. 

Correct  axis 

(M) 

4  features 

Experiment 
8  features 

16  features 

1 

(0.0) 

(-0.2, -0.7) 

(-0.7,  0.9) 

(-0.3,  1.4) 

2 

t  is,  0) 

(13.6,  :j) 

(  0.8,  1.2) 

(  1.2, -0.5) 

3 

(  3C,  0) 

A 

(  3.4,  1.5) 

(  1.6,  0.5) 

4 

(  45,  0) 

A 

A 

A 

5 

(60,  0) 

A 

A 

A 

6 

( n,  0) 

A 

A 

A 

7 

(90,  0) 

A 

A 

A 

8 

(120,  0) 

(38.2,  0.0) 

A 

A 

9 

(150,  0) 

(  9-2.  0.0) 

(  8.2,  0.0) 

( 8.2,  0.0) 

10 

(180,  0) 

A 

A 

A 

11 

(  45,  90) 

A 

A 

A 

12 

(  o.oo) 

A 

A 

A 

13 

(60,30) 

A 

A 

A 

CPU 

time 

5  min. 

10  min. 

15  min. 

Increased  density  of  points  during  global  sann’ng  over  the 
unit  sphere  (approximately  every  11.5  degrees,  other  pa¬ 
rameters  are  the  same  as  in  Table  1). 


pi 


,N 

» 

& 

1  1 

>  . 

r. 

f\ 

u 

* 

* 


p 


i  ■ 


i 


s 


Table  a 


Increased  density  of  points  during  global  sampling  over  the 
unit  sphere  (approximately  every  2 2.6  degrees,  other  pa¬ 
rameters  are  the  same  as  in  Table  1). 


1  Exp- 

No. 

Correct  axis 

4  features 

Experiment 
8  feature# 

16  features 

1 

(  0,  0) 

(-0-2, -0.7) 

(-0.7,  0.9) 

(-0-3,  1-4) 

2 

(  IS,  0) 

(13.6,  2.2) 

(  0.8,  1.2) 

(  1.2, -0.5) 

3 

(30,  0) 

(20.7,  4.2) 

{  3.s,  1.S) 

{  1.7,  0.5) 

4 

(  45,  0) 

(16.7,  5.0) 

(10.7,3.4) 

(-0.1,  031) 

$ 

(60,  0) 

A 

A 

A 

6 

(  75,  0) 

A 

A 

A 

7 

(90,  0) 

A 

A 

A 

8 

(120,  0) 

(213,  0.0) 

(21.3,  0.0) 

(21.3,  0.0) 

9 

(ISO,  0) 

(18.7,  0.0) 

(-8.7,  0.0) 

(  8.2,  0.0) 

10 

(180,  0) 

A 

A 

A 

11 

(  45,  90) 

A 

A 

A 

12 

(  0,90) 

A 

A 

A 

13 

(60,  30) 

A 

A 

(-34,-30) 

CPU 

t<me 

9  min. 

18  min. 

26  min. 

Table  4 


Displacements  between  the  windows  are  1/3  of  the  pixel 
•ise.  Other  parameters  are  the  earns  as  in  Tible  1. 


Exp. 

No. 

Correct  axis 

4  features 

Experiment 
8  features 

16  features 

1 

(o.o) 

(-1.2,  0.0)  (-1.7,  C.0) 

(-04,  2.1) 

2 

(  is,  0) 

(  8.4,  1.9) 

(  1-5,  2.1) 

(  1.2, -04) 

3 

(30,  0) 

A 

(-0.4,  1.2) 

(-34,-04) 

4 

(  45,  0) 

A 

A 

A 

5 

(60,  0) 

A 

A 

A 

6 

(  75,  0) 

A 

A 

A 

7 

(00,  0) 

A 

A 

A 

8 

(120,  0) 

(38.2,  0.0) 

A 

A 

9 

(150,  0) 

(8.2, 0.0) 

(8.2,  0.0) 

(8.2,0  0) 

10 

(180,  0) 

A 

A 

A 

11 

(  45,  90) 

A 

A 

A 

17 

(  0,90) 

A 

A 

A 

13 

-  (60,30) 

A 

A 

A 

TFU 

time 

11  min. 

15  min. 

23  min. 

i.* 

i.' 


U 


L\ 


i 


396 


Tabi*  S 


TabU  7 


Window  aiae  n  3x2.  CiLer  parameter*  an  the  same  a a  in 
Table  1. 


TS FT 

No. 

Comet  axis 

1 

(0,0) 

2 

(  15,  0) 

3 

(30,  0) 

4 

(45,  0) 

5 

(60,  0) 

6 

(  75,  0) 

7 

(00,  0) 

8 

(120,  0) 

0 

(150,  0; 

10 

(180,  0) 

11 

(  45,  00) 

12 

(  0,90) 

13 

(  60,30) 

CPU  time  | 

Addition  of  uniform,  gacemUUd,  white  noiae.  Other- 
wise,  tame  parameter*  a*  in  the  experiment*  in  Table  3, 
with  16  feature*. 


4  feature* 

(-2-4,  0-0) 
(-1.8,  0.0) 
(-0.4,  0.0) 
(  1-7,  0.2) 
(-3-9,  1-0) 
A 

(43.6, -0.2) 
(13.6, -0.2) 
(  8.2,  0.0) 

A 

A 

A 

(-40,-15.5) 
6.5  nun. 


8  feature* 
(-2.2,  0.5) 
(  2.5,  2.2) 
(-6-1,  -0-2) 
A 

(-8.*,  0.7) 
A 
A 
A 

(11.7,0.0) 

A 

A 

A 

A 

6.8  min. 


16  feature* 
(-1-2.  1-0) 
(-1-S.-2.1) 
(-42,-2. 1) 

(  02,  1  2) 
A 
A 
A 
A 

(  8.2,  0.0) 

A 

A 

A 

A 

7.3  min. 


No.  (a,  p)  695* 

1  (  0,  0)  (  02,  l.<) 

2  (  15,  0)  (  12,-02) 

3  (  30,  0)  (-2.5, -02) 

4  (  45,  0)  (  12,  02) 

5  (  SO,  0)  (  62,  1.7) 

6  (  75,  0)  A 

7  (  90,  0)  A 

8  (120,  0)  i  (  0.8,  32) 

0  (160,  0)j(  82.0.0) 
10  (ISO,  0)|  A 

13  |(  60,  30)  (  3.4.-30) 
CPU  time  ~[~  1  hou 
*  (of  uncorrel*  «d  noiae) 


12%  30*  60% 

(-0.7,  1.7)  (  12,  12)  (  0.4,11.5) 
(  2.1, -02)  (-0.7,12)  (  02, -2.1) 
(-3.6, -0.7)  (-22,-02)  (-3.5, -2.1) 
(  02,  02)  (-62,-22)  (  2.7,  2.6) 
(  4.6,  0.7)  (  4.6,  02)  (02,5.6) 
A  A  A 

AAA 
(  02,  0.0)  (  02,  0  0)  (  0.8,  0.0) 
(-12,  0.0)  (-1.0,  0.0)  (14.6,-52) 
AAA 
(  3.4, -30)  (-3.7, -30)  (  3.4, -30) 
1  hour  1  hour  hour 


TabU  6 

Initial  guea*  for  the  translation  axis  is  supplied  a*  an  input 
parameter  to  the  algorithm.  Other  parameter*  an  the  tame 
a*  in  Table  1. 


No. 

(a,/?) 

4  features 

8  features 

16  features 

1 

(0,0) 

(2.7,22) 

(2.5,  0.9) 

(2.4,  1.4) 

2 

(  is,  0) 

(13.7,  2.2) 

(  02,  12) 

(  12,2.5) 

3 

(30,  0) 

(21.0,  42) 

(  3.5,  1.6) 

(-22,2.4) 

4 

(  45,  0) 

(16.1,  42) 

(10.7,  3.4) 

(  02,  0.4) 

5 

(CO,  0) 

(15.0,10.5) 

(  8.7,  7.6) 

(32,0.0) 

6 

(  75,  0) 

F* 

(25.2,-82) 

(  22,-42) 

7 

(00,  0) 

(132,  1.8) 

F 

F 

8 

(120,  0) 

(20.7,2.2) 

(  8.0, -3.4) 

(  7.5, -1.6) 

9 

(150,  0) 

(  0.1,  0.0) 

( 0.1, 0.0) 

(  0.1,  0.0) 

10 

(180,  0) 

A 

A 

A 

11 

(  45,  00) 

A 

A 

A 

12 

(  0,90) 

A 

A 

A 

13 

(60,30) 

(11.8,28.1) 

(-7  1,  22) 

(  92,772) 

11  min. 

13.5  min. 

18  min. 

TabU  S 

Addition  of  uniform,  eoarr*Ut«d,  noiae.  Otherwise,  tame 
parameten  a*  in  experiment*  in  the  Table  3,  with  16  fea¬ 
ture*. 


No.  (o,/J)  6%* 

1  (  0,  0) '(-1.4,  12) 

2  (  IS,  0)  (  1.6,22) 

3  (  30,  0)  (-32,2.7) 

4  (  46,  0)  (2.1,  02) 

5  (  60,  0)  (  3.7,  02) 

6  (  75,  0)  A 

7  (00,  0)  A 

8  (120,  G)  (  0.8,  0.0) 

0  (150,  0)  (  82,  0.0) 

10  (180,  0)  A 

13  |(  60,  30)  (  3.4, -30) 
CPU  time  1  hour 
•  (of  comlatvd  noise) 


Exper.  j 

12%  30%  60% 

(  0.7,  02)  (  02,  12)  (-02,  3.6) 
(-12,02)  (-12,-22)  (-1.4, -1.0) 
(-32,2.7)  (-4.0,  3.0)  (-5.1,22) 
(-5.2, -1.4  )  (-1.0,  02)  (  3.9,  5.4) 
(22, -1.4)  (  5.7,  2.4)  (  2.0,  1.0) 
AAA 
AAA 
(  0.8,  0.0)  (  3.7,  02)  (  9.8,  0.0) 
(-12,  02)  (  82,  0.0)  (  82,  0.0) 
AAA 
(-3.7, -30)  (  3.4, -30)  (  3.4, -3.0) 
l  hour  1  hour  1  hour  ; 


*  (fail*  to  converge) 


I 


*■* 

N."" 

♦  V" 

•  c 

I 

.  <1 

* 

a 

5 

/* 

/ 

* 

i 

i 

4 

u 

» 

'I 


REFERENCES 


[CIBSOJ 

J.  J.  Gibson,  The  Perception  of  the  Visual  World, 
Cambridge,  Maas.,  Rivenide,  1950. 


[GRIST) 

W.  3.  L.  Grimaon,  A  Computational  Theory  of  Visual 
Surface  Interpolation.,  MIT,  AX  Memo  613,  1961. 

[HAN84) 

Allen  R.  Hanaon  and  Edward  M.  Rise  man,  A  Summary 
of  Image  Understanding  Research,  at  tie  Unteerstty  of 
Massachusetts,  Technical  Report  83-35,  Department 
of  Computer  ard  Information  Science,  University  of 
Maasachnaetta,  Amherst,  Massachusetts  01003,  Octo¬ 
ber  1963. 


[LAW83) 

Daryl  T.  Lawton,  Processing  Translational  Motion  Se¬ 
quences,  Computer  Graphic*  and  Image  Processing, 
VoL  22,  pp.  116-144,  1983. 


[LAW84| 

Daryl  T.  Lawton,  Processing  Dynamic  Image  Sequences 
from  a  Mowing  Sensor  Ph.  D.  Dissertation.  University 
of  Massachusetts,  Amherst,  MA  01003.  Also,  Techni¬ 
cal  Report  84-0$,  Department  of  Computer  and  Infor¬ 
mation  Science,  University  of  Massachusetts,  Amherst, 
Massachusetts  01003,  February  1984. 


[MOR77] 

3.  P.  Moravec,  Towards  Automatic  Visual  Obstacle 
Avoidance,  Proceedings  of  the  5th  UCAI,  MIT,  Cam¬ 
bridge.  MA,  1977,  p.  $84. 


[POG84J 

T.  Poggjo  and  V.  Torre,  HI- Posed  Problems  and  Reg- 
ularioLiion  Analysis  in  Early  Vision  A.L  Memo  773, 
MIT,  Cambridge,  MA,  1984. 


[TER84] 

D.  Tersopoulos,  Computation  of  Visible-Surface  Rep¬ 
resentations,  Ph.  D.  Dissertation,  Massachusetts  Insti¬ 
tute  of  Technology,  Jan.  1983. 

[WHI80]  o 

T.  Whitted,  An  Improved  Hlvmsns&on  Model  for  Shade*  ® 

Display,  Communications  of  the  ACM,  23(6),  June 
1980,  pp.  343-349. 

[WIL80) 

T.  D.  Williams,  Depth  from  Camerr  Motion  in  a  Real 
World  Scene,  IEEE  Transactions  on  Pattern  Analy¬ 
sis  and  Machine  Intelligence,  >  AMI-2(6),  November 
1980,  pp.  $11  516. 


,-.1 


r- 


i*  * 


»*,< 

.>  a 


398 


INHERENT  AMBIGUITIES  IN  RECOVERING  3-D  MOTION 
AND  STRUCTURE  FROM  A  NOISY  FLOW  FIELD 


Gil&d  Adiv 

Computer  ard  Information  Science  Department 
University  of  Massachusetts 
Amherst,  MA  01003 


Abstract 

Cn#  of  the  nujor  ana*  is  research  on  dynamic  scene 
anilyii*  it  recovering  3-D  motion  and  (trnctnre  from  opti¬ 
cal  flow  information.  Two  problem*  which  may  arise  doe 
to  the  pretence  of  noise  in  the  flow  field  are  presented  in 
this  paper.  First,  motion  parameters  of  the  sensor  or  a 
rigidly  moving  object  may  be  extremely  difficult  to  esti¬ 
mate  because  there  may  exist  a  large  oet  of  significantly 
incorrect  eolations  which  indace  flow  fields  similar  to  tho 
correct  one.  The  second  problem  is  in  the  decomposition  of 
the  environment  into  independently  moving  objects.  Two 
each  object,  may  induce  optical  flows  which  art  compatible 
with  the  same  motion  parameters  and,  hence,  there  is  no 
way  to  refute  the  hypothesis  that  these  flows  art  generated 
by  one  rigid  object.  These  ambiguities  are  inherent  in  the 
sense  that  they  are  algorithm-independent.  Using  a  math¬ 
ematical  cnatysis,  we  characterise  situations  where  these 
problems  are  likely  to  arise.  A  few  examples  demonstrate 
the  conclusions.  Constraints  and  parameters  which  can  be 
recovered  even  in  ambiguonr  situations  are  presented. 

1.  Introduction 

Dynamic  visual  information  can  be  produced  by  a  sen¬ 
sor  moving  through  tho  environment  and/or  by  indepen¬ 
dently  moving  objects  in  the  visual  field.  The  interpretation 
01  such  information  consists  of  detecting  moving  objects, 
recovering  the  motion  parameters  of  .he  sensor  and  ew-h 
moving  object,  and  structure  determination.  The  results  of 
this  interpretation  can  be  used  to  control  behaviour,  as  ia 
robotics  or  navigation.  They  can  also  be  integrated,  as  an 
additional  knowledge  source,  into  an  image  understar  ling 
system,  such  as  the  VISIONS  system  [BAN7*|. 

The  moet  commor  approach  for  the  analysis  of  visual 
motion  is  based  on  two  phases:  computation  of  an  optical 
flow  field  and  interpretation  of  this  field.  In  the  prr.  rat 
discussion,  the  term  'optical  flow  field’  refers  to  a  ‘veloc¬ 
ity  field’,  composed  of  vec'ors  describing  the  instantaneous 
velocity  of  image  elements  The  second  phase,  i.e.,  the  in¬ 
terpretation  of  the  optical  Cow  field,  is  the  main  concern 
of  this  paper.  The  information  in  only  one  flow  Sela,  as 


This  worn  was  supported  by  DARPA  under  Crant 
N00014-82-K-0464.  Tie  author  is  tew  with  Ra'.ael, 
POB  2250,  Haifa  31021,  Israel. 


opposed  to  a  time  sequence  of  such  fields,  is  assumed  to  be 
given. 

Flow  fields  generated  by  ousting  techniques  are  noisy 
and  partially  incorrect,  especially  near  occlusion  or  motion 
boundaries  (see  ‘;he  discussion  in  [ULL81]).  Two  problems 
may  arise  due  to  the  presence  of  noise  in  the  flow  field.  First, 
motion  parameters  of  the  sensor  or  a  moving  object  may  be 
extremely  difficult  to  estimate  bemuse  there  may  exist  a 
large  set  of  significantly  incorrect  solutions  which  induce 
flow  fields  similar  to  ths  co-toc  t  one.  The  second  problem, 
which  is  cloeely  related  to  the  first  one,  is  in  the  decomposi¬ 
tion  of  the  environment  into  independently  moving  objects. 
Two  such  objects  may  induce  optical  flows  which  are  com¬ 
patible  with  the  same  motion  parameters  and,  hence,  there 
ia  no  way  to  refute  the  hypothesis  that  these  flows  are  gen¬ 
erated  by  one  rigid  object.  These  ambiguities  are  inherent 
in  the  sense  that  they  are  algorithm-independent. 

In  this  paper  we  will  employ  mathematical  analysis  to 
characterise  situations  where  these  problems  are  likely  to 
arise.  A  few  example*  will  demonstrate  the  conclusions. 
Constraints  and  parameters  which  can  be  recove.ed  even  in 
ambiguous  agnations,  as  weli  as  appropriate  modifications 
of  tbs  interpretation  goals,  will  be  presented. 

3.  Equations  Relating  the  Optical  Flow 
IftfrP  Motion,  tad  gtmtm 
3.2,  The  General  Case 

Let  {X,Y,Z)  represent  a  cartesian  coordinate  system 
which  is  fixed  with  respect  to  the  camera  (see  Figure  3.1), 
md  let  (*,  y)  represent  a  corresponding  coordinate  system 
of  a  planar  image.  The  image  is  assumed  to  be  a  square  and 
the  field  of  view  (FOV)  ie  defined  so  be  the  visual  angle  cor¬ 
responding  to  each  side  of  she  image,  which,  therefore,  is 
2ta n(/o*/g)  focal  units.  Tbe  focal  length,  from  the  nodal 
point  O  to  the  image,  is  assumed  to  be  known  and,  without 
loss  of  generality,  it  can  be  normalised  to  1.  Thus,  tho  per¬ 
spective  projection  (c,  y)  on  the  image  of  a  point  (A,  Y,  2) 
in  the  environment  is: 

*  =  X/Z,  y  =  Y/Z.  (2.1a,b) 

The  mo. ion,  relative  to  toe  camera,  of  a  rigid  object  in 
the  eeeae  can  be  deccmpoeed  into  two  components:  trass- 


Y 


Figure  3.1  (redrawn  from  [LON80]):  A  coor¬ 
dinate  system  (X,  Y,  Z)  attached  to  the  cam¬ 
era,  and  the  corresponding  image  coordinate* 

(*,  p) .  The  image  petition  £  it  the  perspec¬ 
tive  projection  ol  the  point  £  in  the  environ¬ 
ment.  X=  (Tx,Ty,Tz)  and  Q  =  (nx.fV.flx) 
represent  the  relative  translation  and  rotation 
of  a  given  object  in  the  acene. 

lation  Z  ■  (Tx,Ty,Tz)  and  rotation  fl  -  (Ox. fly,  fix) 
(these  symbol*  represent  instantaneous  spatial  velocities). 
If  (X,  Y,  Z)  are  the  instantaneous  camera  coordinates  of 
a  point  on  tbe  object,  then  the  corresponding  piojection 
(*, y)  on  the  image  moves  with  a  velocity  (a,/?),  where 
[LON80]: 

a  =  -tlr*y  +  Ok(1  **)  —  0 ip  +  (7x  -  Tz*)/Z,  (3  2a) 

P  =  —  Ox(l  +  V*)  +  Oy  xp  +  0 2*  +  (TV  ~  Tzp)/Z.  (2.2b) 
Notice  that  (a,/J)  can  be  represented  ar  the  sum 

O 

(<*.£)  =  (<*RiP/l)  +  («Ti^r)i  (2-3) 

where  (<*r,Pr)  and  (ar,Pr)  »«.  respectively,  the  *ot*- 
tional  and  translational  components  of  the  velocity  field: 

<*n  =»  -Ox*y+Oi'(l+*a)-niy,  ar  =  (Tx-Tz*)/Z, 

(2.4a,b) 

Pr  =  -Oxfl  +  y^  +  Oi-xy  +  Ox*,  Pt  =  (Ty  -Tzp)/Z. 

(2.4c, dj 

Given  a  flow  field,  we  wish  to  estimate  the  recover* 
able  motion  parameters  of  each  rijpd  object,  relative  to 
the  camera.  These  parameters  are  the  rotation  parameters 
(fix,  fly,  flz),  and  the  direction  of  the  translation  vector 
defined  by  the  unit  vector  {£  =  TJr  where  r  is  the  length 
of  Z-  Notice  that  a  stationary  environment  is  considered 
to  be  a  rigid  object  moving  relative  to  the  camera.  Once  the 


mot.on  parameters  are  recovered,  it  is  also  possible  to  esti¬ 
mate  the  relative  depth,  Z{x,  p)/r ,  corresponding  to  each 
pixel  (z,  p)  where  a  flow  vector  is  defined,  unless  r  =  0  or 
the  location  of  the  vector  is  exactly  in  the  focus  of  expansion 
(FOE). 

3.3  The  Planar  Csss 

In  this  section  we  examine  the  flow  field  induced  by  a 
rigid  motion  of  a  planar  surfee.  Excluding  the  degenerate 
case  in  which  the  same  plane  contains  bc-ih  ihe  surface  and 
th«  aodal  point  (and,  thereto-*,  ths  corresponding  region  in 
the  image  is  a  straight  line),  the  surface  can  be  represented 
by  the  equation 

kxX  +  kyY  +  A*X  =  I.  (2.5) 

The  coefficients  kx ,  ky  and  k%  can  be  any  real  numbers, 
except  the  case  in  which  all  of  them  are  aero.  Introducing 
the  notations  4  =  ( kx,ky,kz )  and  {  =  r4,  and  using 
equa-’oc  (2.1),  we  obtain: 


r/Z  =  Ixx  +  lyp  +  lZ.  (2  6) 

Using  the  identity  £  =  r£  and  substituting  (2.6)  in  (2.2), 
we  realise  that,  given  a  relative  motion  (IQ),  the  flow 
field  is: 

o  *■  sj  +  sj*  +  «*y  +  ****  +  «isy,  (J.7a) 

/>  =  a«  +  «»*  +  «sg  +  ajxp  +  asp*,  (3.7b) 

where: 

*,«ny  +  /xUx,  (2  8a) 

c,  =  lxUx  -  1ZUZ,  (2.8b) 

ai--(lt  +  lyVx,  (2.1k) 

“4  =  -Ox  +  IzUy,  (2.8d) 

«s  =  Og  +  lXVy,  (2.8e) 

at  =*  lyUy  -  IzUz,  (2.8f) 

aj  =  (ly-  lXUz,  (2.8g) 

<*«  =  -fix  —  lyUz-  (2.81a) 


Equations  (2.7)  represent  what  we  shall  call  a  ♦  tniu/or - 
melton.  They  describe  a  2-D  motion  in  the  image  plane, 
represented  by  the  8  parameters  oi,...,o«.  Note  that  a 
similar  representation  of  the  optical  flow  prc-duced  by  a 
moving  planar  surface  is  introduced  in  [WAX83]. 


Given  a  flow  field  induced  by  a  rigid  object,  it  will  be 
shown  that  in  certain  situations  the  flow  field  induced  by 
totally  incorrect  motion  and  structure  may  be  similar  to  the 
correct  one.  In  the  presence  of  noise  which  is  statistically 
larger  t jan  the  difference  between  these  flow  fields,  it  may 
be  impossible  to  obtain  reasonably  accurate  estimates  cf 


the  motion  parameter*.  The  influence  of  certain  factor*  on 
this  ambiguity  will  be  analysed. 

Let  us  start  by  examining  the  canes  of  pure  rotation 
and  pure  translation.  In  a  purely  rotational  motion  the 
flow  field  ia  represented  by  equations  (2.4a,c)  which  can  be 
rewritten  as 

(3.1) 

Thus,  each  rotation  parameter  has  a  distinct  signature  in 
the  flow  field,  and  in  most  cases  it  can  reliably  be  recovered. 

In  a  purely  tranwuonal  motion,  the  direction  of  trans¬ 
lation  is  represented  by  the  focus  of  expansior  (FOE)  which, 
in  this  case,  is  the  intersection  of  the  straight  lines  corre¬ 
sponding  to  the  flow  vectors.  Usually,  the  FOE  can  be  ro¬ 
bustly  recovered  [LAW84],  unless  the  absolute  value  of  the 
translation  is  small  relative  to  the  distance  of  the  surface 
from  the  observer,  in  which  case  the  flow  vectors  are  small 
and  the  determination  of  the  corresponding  intersection  is 
sersitive  to  noise. 

In  the  general  case,  an  ambiguity  in  determining  the 
motion  parameters  becomes  a  much  more  severe  problem 
because  rotation  and  translation  may  induce  similar  flows. 
To  demonstrate  this,  let  us  examine  the  case  of  a  planar 
surface  which  is  parallel  to  the  image  plane,  and  demote  by 
d  the  distance  of  this  plane  from  the  camera.  We  wish  now 
to  compare  the  flow  field  generated  by  a  purely  translational 
motion  (Px,  Py,0)  to  tie  flow  field  generated  by  the  purely 
rotational  motion  {-Py/ d,Px/d,0) .  The  flow  field  in  the 
first  case  is 


while  in  the  second  case  the  flow  is  given  b; 

/or \  _  / *V*Y  +  (1  +  *s)fr \ 

\Pr)~1  1(1  +  y^/Y  +  ' 

Hence, 

\Pr)  \Pt)  WPr  +  zyPx) 

If  the  field  of  view  (FOV)  is  small,  then  the  second-order 
terms  of  the  image  coordinates,  x  and  y ,  are  small,  and 
the  difference  between  the  flow  fields  is  small  as  well  (see 
Figure  3.1).  In  such  a  case  it  may  be  very  difficult,  in  the 
presence  of  noise,  to  distinguish  between  these  fields  and 
to  determine  whether  the  motion  is  purely  translational, 
purely  rotational  or  a  combination  of  both.  Note,  however, 
that  if  the  FOV  is  large,  then  the  second-order  components 
of  the  flow  field  are  relatively  large  and,  therefore,  the  am¬ 
biguity  is  more  likely  to  be  resolved. 


Ambiguity  in  determining  the  motion  parameters  is 
affectod  not  only  by  the  FOV,  but  also  by  variations  in  t-e 
surface  structure.  This  can  be  cor  eluded  from  the  work  of 


(b) 


Figure  3.1:  The  flow  field  in  (a)  is  purely  transla¬ 
tional,  whereas  the  flow  held  in  (b)  is  purely  rota- 
tionai  In  both  case*  the  field  of  view  is  60* .  Not* 
the  similarity  of  the  flow  fields  in  the  central  por¬ 
tion  of  the  image,  where  the  /alues  of  x  and  y  are 
•mall.  On  the  other  hand,  the  difference  between 
the  flow  fields  is  not  negligible  near  the  boundary 
of  the  image. 

Rieger  and  Lawton  [RIE83],  who  examined  the  case  '  f  large 
discontinuities  in  the  depth  map.  In  this  case  the  differences 
between  flow  vectors  near  occlusion  boundaries  are  oriented 
towards  the  FOE  of  the  translational  field  and,  therefore, 
the  ambiguity  in  distinguishing  between  the  translatioLal 
and  rotational  components  can  be  resolved. 


(3-3) 


401 


Is  addition,  it  baa  been  experimentally  shown  [PRA80, 
LON81,  FAN83b]  that  the  accuracy  of  the  estimated  3-D 
motion  parameters  is  improved  when  the  translational  com¬ 
ponent  of  the  motion  is  large  relative  to  the  distance  of  the 
object  from  the  sensor.  The  results  can  also  be  improved  by 
using  a  large  number  of  3ow  vectors  [ROA80,  TSA84j,  and 
by  increasing  the  site  of  the  region  containing  these  vectors 
[PRA80,  FAN83a], 

To  summarise,  ambiguity  (or,  using  another  term,  in¬ 
stability)  in  determining  3-D  information  can  be  expected 
if  the  FOV,  the  depth  variations,  and  the  ratio  of  the  trans¬ 
lation  to  the  distance  of  the  object  from  the  camera  are 
all  small.  In  addition,  local  techniques,  in  which  only  the 
information  in  a  small  region  of  the  image  is  utilised,  are 
more  sensitive  to  noise.  In  the  next  section  we  will  employ 
a  mathematical  analysis  in  order  to  show  that  these  con¬ 
ditions,  as  well  as  other  conditions,  generally  contribute  to 
ambiguity  in  recovering  3-D  motion  and  structure.  Com¬ 
bined  together  these  are  sufficient  conditions  for  such  in¬ 
stability.  Note  that  the  error  analyses  existing  in  the  liter¬ 
ature  are  experimental  and  algorithm-dependent,  whereas 
here  we  develop  a  mathematical  and  algorithm-independent 
analysis. 

3.3  Mathematical  Amahrsls 
3.2.1  A  planer  surface 

In  this  section  we  restrict  ourselves  to  flow  fields  in¬ 
duced  by  a  rigid  motion  of  a  planar  surface  given  by  equa¬ 
tion  (2.5):  kxX  +  kyY  +  kzZ  =  l .  Since 


7 _ 1 _ k  x  y  *y 

~  kz  kz  kz  ’ 


l/kz  ,  denoted  by  d ,  is  the  distance  tvm  the  camera  to  the 
surface  along  ;he  line  of  sight,  and  the  v-iues  —kx/kz  and 
-ky/kz,  denoted,  respectively,  by  tv  and  ty ,  represent 
the  slopes  of  the  surface  relative  to  the  image  plane. 

In  the  following  analysis,  the  image  is  assumed  to  be 
square  and  the  FOV  is  defined  to  be  the  visual  angle  cor¬ 
responding  to  each  side  of  the  image,  which,  therefore,  is 
2tan(/oe/2)  focal  units.  Th  egion  J?  corresponding  to 
the  pe.-spective  projection  of  the  planar  patch  on  the  image 
is  contained  in  a  square  for  which  the  proportion  between 
its  side  and  the  image  side  is  7,  where  0  <  7  <  1.  For 
simplicity,  the  center  of  this  -  mare  is  assumed  to  be  (0, 0) , 
but  even  if  this  is  not  the  case,  results  similar  to  those  which 
we  will  obtain  in  this  section  can  ->e  derived  by  expanding 
the  flow  equations  around  this  .nter.  7,  which  we  shall 
call  the  locality  factor,  will  be  »oiall  if  the  ratio  of  the  object 
sise  to  its  distance  from  the  camera  is  small  relative  to  the 
image  sice  (in  focal  units),  or  if  a  technique  based  on  a  local 
analysis  of  the  flow  field  is  employed. 

The  flow  field  generated  by  the  motion  parameters 
{  Xj  Q }  can  be  described  by  1  f  transformation  (equa¬ 
tions  (2.7))  with  the  8  coefficients  given  in  equations  (2.8  j. 


Employing  the  constraint  U%+U%+ U\  1  in  addition,  we 
obtain  9  non-linear  equations  witu  3  unknowns:  H,  Q  and 
l .  Usually,  these  equations  have  2  sets  of  solutions  [TSA84, 
WAX83),  where,  of  course,  only  oi-e  of  them  is  the  correct 
one.  Let  us  now  denote  by  Q  and  £  estimates  of  the 
motion  and  structure  parameter  values.  Wc  will  show  that 
in  some  situations,  vectors  fj_  significantly  different  from 
the  corresponding  \-aiues  in  each  of  the  two  exact  solutions 
produce  flow  fields  which  are  very  similar  to  the  correct  one, 
if  combined  with  appropriate  values  of  £)  and  jf. 

The  bask  idea  is  that  if  the  region  j?  >«  rather  small 
(in  focal  units)  and  l\ ,  ly  are  not  large,  then,  based  on 
equr*iona  (2.7)  and  (2.8grh),  a  el  ange  in  has  only  a 
small  effect  on  the  second-order  components  of  the  flow 
field.  Therefore,  given  an  arbitrary  tL<  we  concentrate 
on  the  lower-order  components,  and  try  to  find  £i  and  £ 

such  that  the  correct  values  of  the  coefficients  . . a« 

will  be  maintained.  This  will  lead  to  hypothesised  values 
of  the  motion  2nd  structure  parameters.  We  cap  substitute 
these  parameters  in  the  expressions  for  «r  and  at  and  mea¬ 
sure  the  deviation  of  the  obtained  values  from  the  correct 
ones.  These  deviation*  determine  what  we  shall  call  the  er¬ 
ror  field,  that  is,  the  discrepancy  between  the  correct  flow 
field  and  that  p'edicted  from  the  hypothesised  parameters 
Note  that  this  error  field  is  actually  an  upper  bound  to  the 
‘minimal’  error  field  which  could  be  obtained  for  the  same 
(i  by  optimising  the  values  of  ££  and  £  across  all  the  eight 
coefficients  instead  of  3j . 04 . 

Given  a  vector  &,  the  equations  (2.8a)  to  (2.8f)  as¬ 
sociated  with  the  coefficients  ai . as  produce  six  linear 

equations  with  six  unknowns:  ft*  1  Cly  ,  (lz,  lx ,  ly  and 
lz .  These  equations  can  be  represented  by 

F*  =  A,  (3.6a) 


,  (3.6b) 


lu.oc.dj 


Z-Jt 


402 


A  unique  solution  is  guaranteed  if  the  determinant  of 
F ,  denoted  by  D,  is  non-sero.  As  can  easily  be  verified 

D=fiz((Px  +  0$).  (?.7j 

Thus,  if  the  translation  vector  is  not  exactly  perpendicular 
( &x  =  —  0 ),  or  parallel  ( 0Z  =  0 ),  to  the  image  plane, 

then  there  exist  and  L  which  kec  j  the  correct  values  of 

. . a« .  Note,  hcwever,  that  this  solution  may  still  be 

not  physically  realisable  if  the  depth  constraint  2  >  0  is 
not  satisfied  by  one  of  the  depth  values  predicted  by  the 
venter  l-  We  will  return  to  this  problem  in  the  end  of  this 
notion. 

The  solution  of  the  equations  set  (3.6)  is,  if  Djt  0 , 


To  5nd  F~l  we  employ  the  decomposition  technique  in 
[RAL65].  We  represent  F  by 


'■(5  3- 


where  e-ch  of  Fi  (»  =  1,3, 3, 4)  is  a  3  x  3  matrix.  Then 
'--(££)•  (3•,0, 


C,  =  Frl  -  Fi~lFjGs, 

Gt  =  -Fi-'FiOt, 

G»  =  -GtF„Frl 

c«  =  (f« 


(3.11a) 

(3.11b) 

(3.11c) 

(3. lid) 


Necessary  and  sufficient  conditions  for  applying  this  tech¬ 
nique  is  that  Ft  and  Ft  -  are  non-singular  ma¬ 

trices.  These  conditions  are  satisfied  if  F  is  non-singular. 

Employing  the  decomposition  technique,  we  can  now 
obtain 


-D 

0 

vx0l 

(JxV'y 

-Oy 

-U\Vy 

0 

r 

-0\0f 

-0\0y 

VxVi 

.>1 

0 

0 

-u}uz 

TJ{Uz 

-OxUyVz 

0X0'.  • 

0 

0 

Vybz 

UyUz 

UxUz 

-UxOz 

0 

0 

OxUz 

VxUz 

-UyUz 

Oy  UZ 

0 

0 

UxUy 

UXVy 

-Of 

“t'l 

im 

Substituting  the  definitions  of  n  and  *  ir 

/a«\  /-ttx  +  Uyl z 

fly  <»i  iiy  +  Uxiz 

Az  mF_t  «J  =F_t  -Qx  +  Uxlr 
ix  at  ttZ  +  Uylx 

iy  at  Uxl.\  -  Uzh 

b.  J  \**J  \Uyly-Uzlz 


where  U.  fl  and  [  are  the  Correct  values  of  the  motion  and 
structure  parameters.  Multiplying  F~l  by  a,  and  using 
the  notations  txY  =  OxUy-OyUx ,  txz  =  &xUz-&zUx 
and  eyZ  —  OyUz  —  Ozlfy  ,  we  can  derive: 

A  __  r»  .  0ytXY[0ylx-0xW)  .  *YZ . 

=  n*  +  fcflfc  +  oj)  +  wzAZM) 

*i  rt  .  tix*xy{6xW  -  Uylx)  «rz,  „ 

Ay  =  tty  + - 7-7-,  .  - A-*2f3.14b) 


Oz&x  +  fy) 

n  on^fx+M) 

0%  +  Oq. 

t  _  (OxUx  +  OyUyVx  -  *x Wr 

x  W+fy  ’ 

1  _  [OxUx  +  ftyUy)W  +  *ri  ’>x 

,r - W ^ — 

t  _uz,  .  (xy(Qyh :-0xW) 

lz  TTZ  *  Ml  +  d?)  • 


(3.14c) 


(3.14d) 


(3.14e) 


(3.141) 


The  error  field  corresponding  to  the  values  (L,  il  and 
l  is  the  deviation  between  the  flow  field  predicted  by  these 
values  and  the  correct  field.  Referring  to  equations  (2.7), 
its  value  at  the  (z,  y)  pixel  is: 


M  -  ( 

\w*)  i 


Aarz2  +  Aagxy' 
Aarzy  +  A  asy2 


where  A«t  and  Aat  are  the  errors  induced  in  the  coeffi¬ 
cients  07  and  at .  Recall  now  that  the  distance,  denoted  by 
d,  from  the  camera  to  the  surface  along  the  line  of  sight  is 
1  [kz ,  and  the  slopes,  denoted  by  »x  and  ty  ,  of  the  sur¬ 
face  relative  to  the  image  plane  are,  respectively,  -kx/kz 
and  -by/kz-  Hence, 

r/d  =  lz,  »x  ■*  -Ixih  and  ty  =  -ly/lz- 

(3.16a, b.c) 

Thw,  using  equations  (2.8g,h)  and  (3.14),  we  can  obtain: 


•e 


+  - 


— 

v* 


CM 


403 


•  • 


A«t  =(fly  -  Ozh f)  -  (Of  -  Pjrbt) 


(s.m) 


Uxfx-  +  UyfYZj  .  -Ox&yIx  -*•(!  —  OiVv  (  _  *xzh 

UJc  +  U}  *  Uz(U}+Ul)  XV  Uz 


-r  rOXfxz  +  (/-<yz  .  .  ~OxUytx  +  (1  -  &t  W  .  .  <xz  1 

-r\~;--ur  — uTMTU}) — <xr  +  u7  J 

and 

An,  =(-nx  -  Oziy)-  (~nx  -  Vzlx)  (3.1Tb) 

r)x*xz  +  UyfYZ  ,  ,  [U\  -  1)/X  4  UxOyty  .  (Yzh 

—  —  - - - ly  +  . - — - : - (yv - - - 

U}+Uf  Uz(’Jl  +  Ul)  Uz 


«nr(*,y)  = 


\/{Aa 7*1  +  Aa*xy)J  +  (Aa-iry  +  Aagjr)’ 
2v/5i t.  tanf  fovI2)/N 


(3.19) 

Employing  the  definition,  in  the  beginning  of  this  section, 
of  the  locality  factor  7 ,  x  and  y  satisfy  the  inequalities: 

x  <  7tan(/os/2),  y  <  7  tan(/o*/2),  (3.20a,b) 

and,  therefore, 


snr(x,  y)  <  ^  tM^/2)(l  ^LL±!  *«]>.  (3.21) 

2or 


-'\&x(xz  +Vy<yz  t  .  (Uj  ~  l)«x  +  UxUy*y  t  (  <rzl 
d  1  £/!+£>?  V  tWi+tf?)  Xr 


Therefore,  if  the  translation  is  not  large  relative  to  the  dis¬ 
tance  of  the  surface  from  the  camera  (i.e.,  r/d  is  smaU), 
and  the  surface  is  not  very  slant'd  (i.e.,  »x  and  ty  are 
small),  then  Aar  and  Aa«  are  not  large  for  vectors  &  in 
a  relatively  large  neighborhood  of  (/ .  If,  in  addition,  the 
region  Z  is  small,  then  x  and  y  are  small  as  well,  and 
thus  the  deviation  (Aa«,A/9«)  is  very  small.  Under  these 
conditions,  any  error  surface  corresponding  to  possible  val¬ 
ues  of  d,  can  be  expected  to  be  very  flat  around  the  correct 
solution  U_  and,  therefore,  the  process  of  recovering  3-D 
motion  and  structure  will  be  very  unstable  and  sensitive  to 
noise. 


To  determine  more  precisely  how  these  instability  prob¬ 
lems  depend  on  factors  related  to  the  camera  and  the  Sow 
field,  we  will  normalise  the  error  field  given  in  equation 
(3.15)  to  the  noise  level  and  thus,  for  each  vector  H,  ob¬ 
tain  a  measure  of  a  signal-to-noise  ratio  (SNR).  Note  that 
the  error  field  is  used  as  a  ‘signal’  measure,  since  high  values 
of  this  field  reduce  ambiguity.  The  probability  that  the  vec¬ 
tor  (L  will  be  selected  as  the  correct  solution  is  a  decreasing 
function  of  the  corresponding  SNk  values.  Hence,  if  these 
values  are  small  for  a  large  set  of  translation  axes,  then 
instability  in  recovering  3-D  information  can  be  expected. 


The  noise  in  the  flow  field  is  assumed  to  be  additive,  its 
expectation  is  0  and  its  standard  deviation,  in  focal  units, 

is 


2tan(/os/2) 
a‘  =  a’ - N - ’ 


(3.18) 


where  the  image  contains  N  x  N  pixels  and  ar  is  the  stan¬ 
dard  deviation  in  pixel  units.  To  obtain  an  SNR  measure 
we  divide  the  error  field  by  the  square  root  of  the  sum  of  the 
second  moments  of  the  noise  samples,  which  are  assumed 
to  be  independent,  in  both  axes.  Thus,  for  each  pixel  (x,  y) 
where  a  flow  vector  is  defined, 


Even  when  the  SNR  values  are  small,  it  may  be  possi¬ 
ble  to  successfully  recover  the  desired  parameters,  if  there 
exist  many  flow  vectors  and  the  noise  samples  associated 
with  them  are  independent.  However,  in  many  caser,  es¬ 
pecially  if  7  and  N  are  small  and  the  flow  field  is  sparse, 
the  number  of  flow  vectors  is  small.  In  addition,  if  the  flow 
field  is  dense,  then  usually  the  noise  samples  in  neighboring 
pixels  are  highly  correlated.  This  is  the  case,  for  example, 
if  the  noise  is  induced  by  a  round-off  error. 

o 

Before  we  can  conclude  this  -  xtion  we  still  have  to 
deal  with  the  depth  constraint.  This  constraint  is  satisfied 
if,  for  any  pixel  (x,  y)  in  the  region  J 2 ,  the  estimated  value 
of  t/Z  is  positive,  that  is,  the  following  inequality,  derived 
from  (2.6),  holds: 


lz  +  $Xx  t  iyy  >  0.  (3.22) 


Substituting  <x  ■  W  and  fz  with  the  corresponding  expres¬ 
sions  in  equations  (3.14)  and  dividing  by  fz  1  we  obtain  the 
equivalent  constraint: 


Uz  <xy(&x»y  -  Oytx) 
dz  (JZ(U\  +  (J\) 

\fixvx  +  OyUy){»xt  b  *yv)  +  <xy(»: xV  -  «y*)  _  „ 

T¥y - >  0 

(3.23) 

If  the  slopes  *x  and  ty  are  small  and  the  region  Z  is 
small,  then,  usually,  the  second  and  third  terms  in  (3.23)  are 
small  and  the  inequality  is  satisfied  by  vectors  £  in  a  large 
neighborhood  of  H .  Note  that  anyway  these  conditions  are 
among  those  already  specified  as  contrib-tiag  to  ambiguity. 

To  conclude,  the  following  condition*  contribute  to  am¬ 
biguity  in  recovering  3-D  motion  and  structure  parameters: 


404 


»  The  FOV  is  small. 

«  The  locality  (actor  7  is  email 

•  The  planar  surface  is  at  most  moderately  skated. 

•  The  object  is  far  away. 

•  The  absolute  value  of  the  translation  is  small 

•  The  resolution  of  the  image  is  coarse. 

•  The  ncise  level  (in  pixels)  is  high. 

•  The  flow  field  is  sparse. 

•  The  noise  values  in  adjacent  flow  vectors  are  highly 
correlated. 

>.2.2  Aa  sahttmt  mrfm 

Referring  to  -sqnation  (2.5),  the  ‘reciprocal 
depth’  map,  1/Z ,  can  be  gene  ally  represented  by 

_L_  =  kx  +  *x*  +  ky  9  *■  {(*.  f)  (3J4) 

*(*.*) 

where  f(x,y)  is  the  dll  •'eac#  between  1/1  and  the  ap¬ 
proximating  linear  function  kz+kx*+ky  f-  Using  this  rep¬ 
resentation  and  the  normalisation  A  *  r(,  we  can  rewrite 
the  flow  field  equations  (2.2): 

a(x,y)  =  “*(*»»)  +  Wx  ~  *Ux)A(x,y)  (3JSa) 

and 

/9(x, ,)  =  /?,(*.  y)  +  (Uy  -  ,Uz)X(x,  y),  (3-2Sb) 

where  (<*•,/?•)  is  the  *  transformation  corresponding  to 
the  planar  surface  kzX  +  kyY  +  kzZ  —  1  • 

Given  £,  we  can  usually  choose  rotation  parameter 
values  A  •  and  normalised  plane  parameter  values  2 ,  which 
maintain  the  correct  seroth  and  first  order  components  of 
the  ♦  transformation.  If,  in  addition,  for  each  flow  vector 
we  choose  the  value  of  A  as  the  correct  one,  then  the  error 
field  corresponding  to  these  motion  and  structure  parame¬ 
ters  is 

(SHEIKH  <*-» 

where  Aa«  and  A0*  are  the  errors  associated  with  the 
planar  rerfaee  (equation  (3. IS))  and  (Al/x.  Alfy,  AUy)  is 
the  error  in  the  normalised  translation  vector.  Therefore, 
we  can  expect  instability  in  determining  the  3-D  motion  and 
structure  if,  in  addition  to  the  conditions  associated  with 
pl-uaar  surface),  the  function  1/Z  can  be  reasonably  ap¬ 
proximated  by  a  linear  function,  i.e.,  A(x,  y)  is  smalL  Note 
that  this  condition  means  that  the  environmental  surface 
can  be  relatively  approximated  by  a  planar  surface,  that 
is,  the  distance  between  the  two  surfaces  is  small  relative 
to  the  distance  from  the  sensor  to  the  real  surface.  This 


approximation  can  osnaby  be  improved  as  the  FOV  or  the 
locality  factor  7  are  reduced,  unless  there  is  a  significant 
discontinuity  in  the  depth  map. 

3.3  Enmnlx 

In  this  section  we  demonstrate  the  influence  of  three 
parameters  on  the  degree  of  instability  in  recovering  3-D  in¬ 
formation  from  the  flow  field  induced  by  a  rigid  motion  of  a 
planar  surface.  These  parameters  are  the  locality  factor  7 , 
the  ratio  r/d  of  the  translation  magnitude  r  to  t/u  distance 
d  from  the  camera  to  the  surface  along  the  line  of  sight, 
and  the  slope  Jg  of  the  planar  surface.  The  demonstra¬ 
tion  is  based  on  several  examples,  where  in  each  example 
a  dense  flow  field  is  simulated.  The  technique  presented  in 
[ADI8Sa,b]  for  estimating  3-D  mo'Jon  and  structure  is  then 
employed.  In  this  technique,  the  search  for  3-D  motion  pa¬ 
rameters  is  based  on  a  least-squares  procedure  which  min¬ 
imises  the  deviation  be. ween  the  given  flow  field  and  that 
predicted  from  the  computed  parameters.  The  minimisa¬ 
tion  algorithm  employs  an  error  measure  corresponding  to 
possibls  locations  of  the  FOE  in  the  image  plans.  Foe  each 
hypothesise!  FOE,  the  optimal  rotation  parameters,  the 
sign  of  the  translation  vector,  and  a  related  error  value  are 
computed.  A  minimal  value  of  the  resulting  error  function 
is  determined,  using  a  multi-resolution  sampling  technique. 

The  error  function  can  be  defined  on  the  unit  hemi¬ 
sphere  X  =  (Jf  :|  !L  !=  I,  Uz  >  0}  which  is  bonor- 
phic  to  the  image  plans.  Employing  a  spherical  coordi¬ 
nate  system  (r,9,9),  where  9  is  the  angle  between  the 
line  of  sight  and  tbs  translation  vector,  and  9  is  tbs  angle 
between  the  x-axis  and  tbs  projection  of  the  translation 
vector  on  the  image  plans,  X  can  be  represented  by  the 
•et  {(*,*)  :  0  <  i  <  «T,  0*  <  9  <  380*}.  The  angles 
(4, 9)  are  used  in  Figures  3.2  to  3 A  as  polar  coordinates. 


Figure  3.2:  The  error  function  in  example  1.  The  surface 
is  non-slanted,  r/d  =  0.1 ,  and  7=1. 


Note  that  g  range*  from  0*  at  the  center  np  to  90*  at  the 
boundary. 

The  sharpness  oi  the  error  function  around  the  cor¬ 
rect  value  determine*  the  sensitivity  to  noise  in  estimating 
the  translation  axis  and,  therefore,  also  in  estimating  the 
rotation  parameters  and  the  environmental  structure.  In 
all  the  examples  th*  FOV  is  CO* ,  the  number  of  pixels  is 
128  x  128,  the  camera  translation  is  21  =  (0,0, 10)  and  the 
rotation  is  (0,0,0).  In  the  first  three  cases  the  surface, 
defined  by  the  equation  2  =  100,  is  parallel  to  the  image 
plane.  The  locality  factor  7,  on  the  other  hand,  is  differ¬ 
ent  in  each  of  these  experiments:  1  in  the  first,  1/2  in  the 
second  and  1/4  in  the  third.  The  contour  maps  in  Figures 
3.2, 33  and  3.4  show  th*  drastic  change  in  th*  sharpness  of 
the  corresponding  error  functions.  The  contours  are  labeled 
by  the  corresponding  error  values,  given  in  pixels,  and  the 
correct  solution  is  marked  by  a  black  dot. 

In  examples  4  to  7  w>  choose  again  7=1,  but  th* 
plznar  surface  is  varied.  In  examples  4  and  6  th*  surface 
is  still  parallel  to  the  image  plane,  bat  its  distance  from 
the  camera  is  330  in  example  4,  and  400  in  example  6. 
Thus,  the  influence  of  the  relative  translation  magnitude 
r/d  is  demonstrated  by  examplee  !,  4  and  5,  in  which  r/i 
is  0.1,  0.05  and  0.025,  respectively.  The  reenlta  in  Figuree 
3.2,  3.5  and  3.6  clearly  show  that  smaller  values  of  r/d  are 
associated  with  higher  levels  of  ambiguity. 

In  examples  6  and  7  the  distance  d  is  kept  at  100, 
but  th*  surfaces,  defined  respectively  by  X  ■  I00+0.414X 
and  2  =»  100+ X,  are  slanted:  22.5*  >n  example  6  and  45* 
in  example  7.  Figures  3.2,  3.7  and  3  J  show  that  the  er¬ 
ror  function  become  sharper  ae  the  surface  becomse  mote 
slanted.  The  basic  reason  for  this  relation  is  th*  dpeth  vari¬ 
ation  associated  with  slanted  surface*.  This  variation  helps 
in  resolving  th*  ambiguity  in  distinguishing  between  the 
translational  and  rotational  components  of  th*  flow  field 
since  the  first  component  is  affected  by  variations  Lu  depth, 
while  th*  second  component  is  independent  of  th*  depth 
values. 

Note  th*  second  accurate  eolation  in  Figures  3.7  and 
3.8,  which  corresponds,  according  to  equations  developed  in 
|  WAX83),  to  a  situation  where  the  surface  is  nun-slanted  but 
the  translatory  motion  is  not  along  the  line  of  sight;  instead, 
the  motion  deviates  by  22.S*  and  46* ,  respectively,  from 
this  line.  The  relative  translation  along  th*  line  of  sight, 
that  is,  Tz/d,  is  still  0.1  for  these  alternative  solutions. 
Since  in  these  cases  th*  translational  motion  along  the  X  - 
axis  is  non-sero,  th*  ratio  r/d  is  larger  than  0.1,  specifically 
0.1082  in  experiment  6,  and  0.1414  in  experiment  7. 

The  solution  in  experiment  1  and  th*  alternative  solu¬ 
tion*  in  experiment*  6  and  7  correspond  to  situations  where 
the  surface  is  parallel  to  th*  image  plan*.  Yet,  there  is 
a  large  difference  in  the  sharpness  of  th*  error  function* 
around  these  solutions.  This  difference  may  partly  be  dne 
to  the  change  in  r/d .  The  second  factor  which  apparently 
influences  the  degree  of  ambiguity  in  these  casus  is  the  devi¬ 
ation  between  the  line  of  sight  and  the  trouslation  axis.  At 


Figure  tJ:  The  error  function  in  example  2.  The  surface 
is  non -slanted,  r/d  »  0.1 ,  and  7  =  0.5. 


Figure  3.4:  The  error  function  in  example  3.  Th*  surface 
is  non-slanted,  r/d  =  0.1 ,  and  7  =  0.25. 


4  06 


Flgwt  3.0:  The  error  function  in  example  S.  The  surface 
ie  non-slanted,  r/d  =  0.02S ,  and  7=1. 


Figure  J.T:  The  error  function  in  example  6.  Tht  surface 
ie  slanted  ( 22.5* ),  r/d  =  0.1 ,  and  7=1.  Note  the  second 
solution  which  corresponds  to  a  non-slanted  surface  and  a 
translation  not  along  the  line  of  sight  (the  angle  between 
the  translation  vector  and  the  line  of  sight  is  22.S* ). 


Figure  5.3:  The  error  function  in  example  7.  The  surface 
is  slanted  (-15*),  r/d  =  0.1,  and  7  =  1.  The  second 
solution  corresponds  again  to  a  non-slanted  surface  and  a 
translation  not  along  the  line  of  sight  (this  time,  tb '  angle 
between  the  transition  vector  and  the  line  of  sight  is  4S*). 


407 


least  in  the  case  of  a  non-elanted  planar  surface,  it  seems 
that  the  instability  is  reduced  as  the  t.  anslation  vector  in¬ 
creasingly  deviates  from  the  line  of  sight. 


Tlatormlned 


In  ambiguous  situations,  when  the  surface  can  be  rel¬ 
atively  approximated  by  *  piane,  we  can  still  recover  useful 
information  in  terms  of  partial  constraints  on  the  motion 
and  etru.ture  parameters.  Usually,  the  coefficients  of  the 
Oth  and  1st  order  components  of  the  flow  Held,  that  is,  the 
coefficients  at . .  of  ths  V  transformation  (see  equa¬ 

tions  (2.8a-f)),  can  be  reliably  estimated.  Integration  of 
these  constraints  over  a  time  sequence  of  flow  Gelds  may, 
eventually,  resolve  the  ambiguity  and  result  in  a  unique  in¬ 
terpretation. 

If  a  planar  patch  is  independently  moving  and  the  cam¬ 
era  is  stationary,  then  the  ambiguity  is,  at  least  partially, 
the  result  of  using  a  camera  coerriinate  system.  In  this  coor¬ 
dinate  system  a i  and  04  are  sums  of  the  X  and  Y  trans¬ 
lations  (normalised  by  the  distance  d  from  the  camera  to 
the  object  along  the  line  of  sight)  and  rotations.  It  may  be 
very  difficult,  hewevei,  to  determine  the  correct  decompo¬ 
sition  to  the  rotational  and  translational  components.  On 
the  other  hand,  it  is  possible  to  define  an  ‘nbjtet  eoordinaU 
system’  which  is  parallel  to  the  ermera  coordinate  system, 
but  its  center  is  shifted  to  the  surface  along  the  line  of  sight. 
In  this  coordinate  system  at  and  04  are,  respectively,  the 
X  and  Y  translations  normalised  by  d  Hence,  ihvsi  nor¬ 
malised  translations  can  be  reliably  recoveied. 

Let  us  now  examine  the  situation  where  at  least  cne  of 
the  two  following  conditions  is  satisfied:  (a)  the  translation 
is  along  the  line  of  sight,  that  is,  Ux  =*  Uy  —  0;  (b)  the 
surface  can  be  relatively  approximated  by  a  planar  surface 
parallel  to  the  image  plane,  that  is,  lx  =  ly  —0.  Note  that 
this  situation  is  very  common  in  real  scenes.  Employing 
equations  (2.8c, e),  aj  3  -as  in  this  case,  and  0 z  car  be 
estimated  by  (05  -  aj)/2 .  In  addition,  aj  =  04  and  Tz , 
normalised  by  the  distance  to  the  object  along  the  line  of 
sight,  can  be  estimated  by  (aj  4-  a«)/2 .  In  this  situation, 
this  is  the  inverse  of  the  time-to-colluum  and,  therefore,  we 
can  usually  obtain  a  reasonably  accurate  estimate  of  this 
important  parameter,  even  when  ambiguity  in  recovering 
3-D  information  does  exist. 

In  order  to  show  how  the  situation  discussed  above 
can  be  detected,  we  will  prove  that  when  as  3  —as  and 
oj  3  04 ,  then  Ux  =  Uy  =  0  and/or  lx  =  ly  =  0.  That  is, 
the  first  equalities  are  not  only  necessary  but  also  sufficient 
conditions  for  the  latter  ones.  To  prove  this,  notice  that  the 
equalities  aj  =  -as  and  aj  =  04,  combined  with  equations 
(2.8b,c,e,f),  lead  to  the  equalities: 

lyUx  =  -lXUy,  (3.27a) 

IxUx  =  lyUy.  (3.27b) 


Assuming  that  Ux,  Uy ,  lx  And  ly  are  all  non-sero,  we 
can  divide  each  side  of  the  first  equation  with  the  cor¬ 
responding  side  of  the  second  equation,  and  '.has  obtain 
ly  /lx  =  —lx /ly  which  leadt  to  a  contradiction:  (ly/lx)2  = 
- 1 .  Therefore,  at  least  one  of  the  quantities  Ux  ,  Uy  ,  lx  , 
ly  must  be  0.  Suppose  now  that  Ux  =  0;  examining  equa¬ 
tions  (3.27),  it  follows  that  lxUy  -  0  and  lyUy  =  0  ana, 
therefore,  Uy  =  0  and/or  lx  —  ly  —  0.  Similarly,  each 
of  the  conditions  Uy  =  0 ,  lx  =  0 ,  ly  =  0  leads  to  the 
desired  result. 

Another  approach  which  may  be  taken  in  order  to  deal 
with  instability  in  recovering  motion  parameters  is  based 
on  representing  possible  values  of  these  parameters  by  a 
probabilistic  distribution  function.  Such  a  function  can 
be  defined,  for  example,  on  the  unit  hemisphere  M ,  us¬ 
ing  the  computed  values  of  the  corresponding  error  field  in 
[ADI8Sa,b|. 


We  demonstrated  in  Section  3  that  in  some  situations 
there  exists  a  large  set  of  motion  parameters  whkh,  assum¬ 
ing  the  presence  of  noise,  are  consistent  with  the  flow  field 
generated  by  a  rigidly  moving  object.  Suppose  now  that 
two  independently  moving  objects  are  given.  If  the  two 
corresponding  solution  sets  of  motion  parameter  are  large, 
then  the  possibility  that  these  sets  intersect  ercb  ether  is 
not  negligible.  Such  an  hterse^km  comrpondt  to  3-D  mo¬ 
tion  parameters  whkh  are  consistent  with  Loth  objects.  In 
this  case  the  optical  flows  can  be  interpreted  as  resulting 
from  one  rigidly  moving  object.  Note  thxt,  to  the  best  of 
our  knowledge,  this  ambiguity  has  no*  fc  jen  addressed  yet 
in  the  literature. 

To  demonstrate  the  poseibility  of  ambiguity  in  decom¬ 
posing  the  flow  ceM  into  sets  corresponding  to  rigid  objects, 
we  wish  to  show  that  there  exist  non-trivial  situations  in 
which  we  can  find  motion  parameters  compatible  with  the 
flows  generated  by  two  independently  moving  objects.  If 
the  surfaces  of  the  objects  can  be  relatively  approximated 
by  planes,  then,  following  Section  3.2,  we  can  examine  this 
possibility  of  ambiguity  by  trying  to  compute  motion  and 
-.truc'ure  parameters  which  are  consistent  with  the  coeffi¬ 
cients  ai,...,a«  of  the  associated  *  transformations.  The 
resulting  errors  in  the  second-order  components  of  the  op¬ 
tical  flowc  will  be  small  if,  for  example,  the  field  of  view 
is  email  enough.  Since  the  six  motion  parameter*  and 
Q  should  be  the  same  for  both  objects  but  the  three  struc¬ 
ture  parameter*  can  be  different,  we  obtain,  including  the 
constraint  +  Uy  +  U%  =  1 ,  !3  equations  with  12  un¬ 
knowns.  It  it  reasonable  to  expect  that  in  many  situations 
these  equations  do  have  a  solution. 


>1 


Continuing  the  introductory  discussion.  Let  us  now  ex¬ 
amine,  as  we  did  in  Section  3.4,  the  common  situation 
where,  for  each  of  the  two  objects,  at  least  one  of  ihe  fol¬ 
lowing  conditions  (not  necessarily  the  same)  must  be  satis¬ 
fied:  the  translation  is  along  the  line  of  sight,  or  the  surface 
can  be  relatively  approximated  by  a  planar  surface  parallel 
to  the  irr  \ge  plane.  In  such  a  situation  If*  =  Ify  =  0  c* 
lx  =  iy  —  0 ,  and  =  Uy  —  0  or  l'x  =  ty  =■  0 ,  where  t.  e 
parameters  associate  with  ihe  second  object  are  marked  by 
the  symbol <M.  In  addition,  >et  us  assume  that  0/  =  (Y2 
and  that  the  signs  of  If/  and  W2  are  the  sam-  ,  where 
If/  =■  0  if  and  only  if  U'z  —  0  Finally,  to  guarantee  a  solu¬ 
tion,  it  is  assumed  that  if  Cf/  /  0  and  IzUg  =  *zUz  • 

P.y  +lzUx  =  tty+Wx  and  -ttx+lzUy  =  -^x+^Uy . 

Employing  equations  (2.8a)  ?.  (2.8f),  we  can  obtain 
the  following  equations,  l  elated  u>  thi  first  object,  where 
the  unknowns  are  denoted  by  the  symbol  **’ : 


v 

fty 

+  bfix  = 

-  «l 

(= 

n  y  +  izUx), 

(4.1a) 

ix0x 

-  lZ0Z  - 

"  UJ 

(= 

-lzUz), 

(4.1bj 

jjj 

-ft/ 

+  lyOx-- 

=  3J 

(- 

-nz), 

(4.1c) 

►  > 

-ft* 

+  /‘r  Cy- 

=  »4 

(= 

-Ojc  +  f/Ify), 

(4- Id) 

w’, 

ft/ 

+  %oY  - 

= 

(= 

0z) 

(4.1*) 

v" 

U' 

ajiH 

u* 

Iy0y 

-  \z0%  - 

=  a« 

(= 

-I/If/). 

(4- If) 

A  similar  set  of  equations  can  1 .  obtained  for  «he  second 
object. 

Let  us  sta- the  solution  process  by  choosing  ft/  = 
fl 2  .  ix  ly  =  3  and  'x  —  K  -  °>  thus  satisfying  equa¬ 
tion*  (4.1c)  and  (4.1e),  as  well  as  the  corresponding  eqt  *- 
tior  •  associated  with  the  second  object.  We  proceed  (17 
examining  the  case  Uz  /  0,  in  which  we  constrain  Oz  to 
be  non-sero  and  to  have  the  same  sign  as  the  sign  of  Uz . 
Thun,  from  (4.1b)  and  (4. If)  we  can  obtain 

h  =  Iz^f-  (4-S) 

Uz 

Substituting  this  expression  in  (4.1a)  and  (4.1d)  yields 

fty  +  I/lf/m,  =  a,  (4.3a) 

and 

—  ft  x  +  f/ff/mj  =  04,  (4-3b) 

where  (m„iii,)  =  {0x/0z,0y/0z)  is  the  corresponding 
FOE.  Similarly,  wc  can  obtain  the  following  equations,  cor¬ 
responding  to  the  second  object: 

ft}'  +  =  (4-4a) 


— ft/  +  f/If/rt*/  =  a«-  '4.4  b) 

Combining  (4.3)  with  (4.4)  yields 

ai  —  I/I^/rh*  —  —  l/H/*f»#  (4.5a) 

and 

a«  -  J/tf/rtif  =  0*4  -  fjff/ifi,.  (4.5b) 

L  lzUz  =  fz U'z  1  then,  according  to  onr  assumptions,  a(  = 
a'j  and  oa  =  «4 ,  and,  therefore,  we  can  choose  arbitrary 
values  of  m,  and  .  Otherwise, 

and 

*»*5sK6r  (,‘b) 

The  values  of  ft*  and  fty  can  now  be  computed  from 
equations  (4.3)  or  (4.4). 

Let  us  now  examine  the  complementary  case  where 
If/  3  0.  In  this  case,  to  satisfy  equations  (4.1b)  and  (4.1f), 
we  choose  0z  =  0.  Combining  equations  (4.1a)  and  (4.1d) 
with  ihe  corresponding  equations  associated  with  the  sec¬ 
ond  object  yields 


e 

1 

n* 

11 

€»_ 

1 

S* 

(4.7a) 

and 

04  —  l/Oy  =*  a4  —  i/^y- 

(4.7b) 

Therefore, 

(?/  -  [z)0x  =  <*i  -  °» 

(4.8a) 

and 

(f/  -  h)Qy  -  <»«  -  a«- 

(4.8b) 

Thus,  since 

$ 

r-tz  =  ±\/M  -  a,)5  +  K  -  <u)5- 

(4.9) 

1/  01  =  o'  and  04  =  04 ,  then  1/  =  1/ ,  and  any  Ox  and 
Cy  which  satisfy  0\  +  0\  =  l  are  legitimate  solutions. 
Otherwise,  that  is,  if  «i  jt  a',  or  a4  /  a4  ■  then 


Uj  —  0| 


iv'K  -«l)3  +  (“'«  ~  «4? 

a\  -  a« 

±v  (a'j  “  «»)*  +  K  ~  °«)3 


(4.10a) 


(4.10b) 


To  finish  the  solution  process,  we  should  chooei  1/  and  1'2 
which  satisfy  the  constraint  (4.0)  and,  then,  using  equations 


409 


K 


i 


r 


! 

j 

s 

S 

I 


(4.1a)  a*d  (4. Id),  determine  the  Talus*  of  Ax  and  Ay . 
It  is  optimal  to  select  the  values  of  lg  and  K  such  that 
the  resulting  error*  in  the  coefficient*  «r  and  a*  of  the  • 

iransfcrmadons  will  be  aciairaaL 

4.3  An  Rgempie 

In  order  to  demonstrate  how  different  motions  can  be 
interpreted  as  one  rigid  motion,  let  n*  examine  the  case 
where  ♦'  o  planar  pstchej,  parallel  to  the  image  plane,  «re 
indepeu  -ently  translating.  Both  translations  are  assumed 
to  be  ‘■arallel  to  the  image  plane,  bnt  one  object  ie  trans¬ 
lating  in  parallel  to  the  X-axis  generating  flow  values  of 
(-0.04,0)  (in  focal  units),  and  the  second  object  is  trans¬ 
lating  in  parallel  to  the  Y-axis  generating  floe  values  of 
(0,0.03)  (see  Figure  4.1).  Note  that  si  *  -0.01 ,  =  0.03 

and  the  other  coeffisierts  of  the  V  transformations  associ¬ 
ated  with  the  objects  are  0. 


ty 


Figure  4.1:  The  optical  flows  induced 
by  the  translation  of  two  objects. 


We  wish  to  recover  motion  parameters  &,  and  A  which 
are  compatible  with  both  sets  of  flow  vectors  and  with  struc¬ 
ture  parameters  l  and  £  corresponding,  respectively,  to 
the  first  and  second  object.  Employing  the  results  in  Sec¬ 
tion  4.2  for  the  case  Ug  =  0  (equations  (4.10)),  we  have 
&  =  (±0.8, ±0.6,0),  Az  =  0,  lx  =  iy  =  0,  l'x  =  VY  =  0 
and  <2  -  tg  =  ±0.05.  In  addition,  using  equations  (4.1;.) 
and  (4.1d),  Ax  =  ±0.6/z  and  Ay  =  -0.04  0.8fz  . 

Since  Ax  and  Ay  are  exactly  the  errors  in  the  ;oef 
fidenU  at  and  a*  of  the  ♦  transformations,  we  wish  to 
minimise 

xtiz)  a=  A<t  +  =  0.36 /l  +  (0.01  ±  0.8  <z)J.  (4.11) 

Lei  us  now  distinguish  between  the  cases  Vz  >  lg  and  Vz  < 
lg.  In  the  first  case  &  =  (0.8, 0.6, 0)  and  the  first  dei  ivative 
of  xOz)  i*  2/*z  +  0.064.  Since  lg  is  constrained  tc  be 
positive,  the  minimum  of  xUz)  >  obtained  for  lg  =  0,  is,  in 
this  case,  0.0016.  Note  that  ig  =  0  means  that  the  object  is 


at  infinity,  which  is,  of  course,  unrealistic;  however,  taking 
a  sufficiently  Large  di dance  of  the  object  from  the  camera, 
lg  can  be  arbitrarily  dose  to  0. 

In  the  second  case  &  =  (-0.8,  -0.6, 0) ,  the  derivative 
of  xih)  '*  2l‘z  -0.064  and  lg  should  be  at  least  0.05. 
Hence,  the  minimum  of  xOz)t  achieved  for  lg  -  0.05, 
is,  in  this  case,  0.0000.  The  optimal  solution  is,  thereto'#, 
H  =  (-0.8, -0.6,0),  i  =  (0, 0, 03)5) ,  1  =•  (0,0,0)  and 
A  =  (-0.03,0,0) .  Assuming  email  second  order  terms  of 
tbs  rotational  component,  this  solution  can  be  graphically 
represented  by  Figure  4.2. 

Since  Ay  =  0,  there  is  no  error  in  «t  i  on  the  other 
hand,  there  ie  an  error  in  o«  which  ie  -Ax  3  0-03 .  The 
corresponding  discrepancy  between  the  correct  flow  field 
and  that  predicted  from  the  above  narameters  is  small  if 
the  FOV  is  small,  or  both  the  sise  of  i  *»  object#  and  their 
distance  from  the  line  of  sight  are  small  relative  to  their 
distance  from  the  camera. 

4.4  rowfiwdh.w  Bsanin  sm Riwwutw 

Amagiian 

We  have  just  shown  that  a  rigidity  assumption,  similar 
to  the  ois  propoeed  in  [ULL7D),  is  not  appropriate  when 
the  Cow  field  is  noisy,  thal  is,  the  consistency  of  a  set  of 
flow  vectors  with  the  same  3-D  motion  parameters  dees  not 
reasonably  guarantee  th-,t  they  are  really  induced  by  one 
rigidly  moving  object.  Obeerving,  in  addition,  that  almost 
any  act  which  contains  less  than  5  flow  vectors  .is  consis¬ 
tent  with  some  V  transformation,  we  propoee  a  modified 
assumption:  a  ut  of  at  least  5  adjacent  flow  codon,  wbie k 
ora  compatible,  up  to  tbs  estimated  noise  leml,  with  c  rigid 
motion  of  •  planar  patch,  will  5*  assured  to  bs  induced  by 
ota  rigidly  mooing  object.  This  assumption  has  been  suc¬ 
cessfully  applied  [ADI85a,b|  to  segment  noisy  how  fields. 

In  some  situations,  however,  the  consistency  of  two 
sets  of  flow  vectors  with  the  same  motion  parameters  is 
still  a  strong  evidence  for  the  hypothesis  that  these  **t* 
are  generated  by  one  rigidly  moving  object.  This  is  the 
case,  for  example,  when  accurate  motion  par  ametera  can 
be  separately  recovered  for  each  set.  In  such  a  situation, 
similarity  of  the  results  is  net  likely  to  be  accidental. 

Nevertheless,  In  general  we  still  must  accept  the  pos¬ 
sibility  of  ambiguity  in  grouping  flow  vectors  into  seta  cor¬ 
responding  to  rigidly  moving  objects.  Hence,  the  interpre¬ 
tation  of  the  flow  field  should  result  in  a  set  of  possible 
decompositions,  rather  than  only  one  decomposition.  Each 
hypothesised  object  can  be  assigned  a  probability  value, 

based  on  the  number  of  segments  composing  the  object’s 
flow  and  on  the  degree  of  ambiguity  in  separately  recover¬ 
ing  the  motion  parameters  associated  with  each  of  them. 


A  , 


1 


\;-1 
’  -i 


410 


Figure  4.X:  Graphical  representation 
of  the  optimal  solution.  The  flow  vec¬ 
tors  corresponding  to  the  flnt  and  see- 
ond  object  are  denoted  by  g  and  1* , 
respectively.  The  rotational  compo¬ 
nent  of  the  optimal  eolution  ie  ij , 
while  the  translational  component  cor¬ 
responding  to  the  first  object  is  gy 
and  the  translational  component  cor¬ 
responding  to  the  second  object  is  (0, 0) . 

I.  Conclusions 

We  base  chaiacterised  and  demonstrated  situations  in 
which  there  exists  an  inherent  ambiguity  in  the  interpreta¬ 
tion  of  noisy  flow  fields.  The  first  ambiguity  is  is.  recovering 
the  motion  parameters  from  a  noisy  flow  field  reaerated  by 
a  rigid  motion.  We  found. that  if  the  field  cl  view  corre¬ 
sponding  to  the  region  containing  the  interpreted  flow  Geld 
is  small,  and  the  depth  variation  and  translation  magni¬ 
tude  are  trail  relative  to  the  distance  of  the  oLject  from 
the  camera,  then  the  determination  of  the  3-0  motion  and 
structure  can  be  expected  to  be  very  sennU've  to  noise  and, 
in  the  presence  of  a  realistic  level  of  noise,  practically  impos¬ 
sible.  W «  experimentally  found  that  there  is  also  a  relation 
between  the  location  o i  the  FOE  and  the  degree  of  ambigu¬ 
ity.  This  relation  should  be  mathematically  investigated  in 
future  research. 

The  second  ambiguity  is  in  tbs  decomposition  of  tbs 
flow  field  into  sets  corresponding  to  independently  moving 
objects.  We  found  that  the  rigidity  assumption  is  not  ap¬ 
propriate  for  noisy  tow  fields,  that  is,  the  consistency  of 
a  set  of  flow  vector.'  with  the  same  motk  u  parameters,  up 
to  the  estimated  ne.se  level,  does  not  reasonably  guarantee 
that  they  tit  really  Induced  by  one  rigid  motion.  As  an 
aiternrtive  to  this  assumption,  it  is  assumed  in  [ADI85a,b1 
that  a  connected  set  of  fl'W  vectors,  which  is  consistent 
with  a  rigid  motion  of  a  planar  surface,  is  induced  by  a  sin¬ 
gle  rigid  motion.  This  assumption  is  weaker  than  the  first 
version  of  tbe  rigidity  assumption  in  the  sense  ,hat  it  can 


only  be  applied  in  more  restricted  situations  and,  therefore, 
it  is  mors  likely  to  be  correct. 

The  results  of  the  ambiguity  analysis  can  be  used  when 
tie  effectiveness  of  motion  algorithms  is  evaluated  for  real- 
world  tasks  They  can  help  to  decide  which  algorithm  to 
Jicose,  and  in  what  situations  this  algorithm  can  be  ex¬ 
pected  to  be  effective. 

Constraints  and  parameter*  which  can  be  extracted, 
even  in  ambiguous  situations,  were  also  introduced.  Inte¬ 
gration  of  such  partial  information  over  n  time  sequence  of 
flow  fields  may,  eventually,  resolve  the  ambiguity  and  result 

in  *  oniqns  interpretation.  In  addition,  combining  this  in¬ 
formal  on  with  other  knowledge  sources  (e^,  a  fiber  optic 
rotation  sensor  (LAWM|)  can  be  considered. 

Recovering  motion  and  structure  of  independently  mov¬ 
ing  objects  may  be  particularly  difficult,  as  was  demon¬ 
strated  by  the  flat  error  surfaces  obtained  for  such  objects 
in  the  second  and  fifth  experiments  in  [ADISSbj.  In  gen¬ 
eral,  ambiguity  in  recovering  3-D  motksu  and  structure  of 
independently  moving  objects  can  be  expected,  since  th» 
effective  field  cf  view  and  tbe  ratio  of  the  depth  variation 
to  tbe  distance  between  the  object  and  the  camera  are  usu¬ 
ally  small.  Furthermore,  additional  information  from  other 
knowledge  sources  may  be  hard  to  acquire.  Therefore,  the 
poeeibility  of  partially  resolving  the  ambiguity  in  such  a 
cast,  by  using  an  oiysef  coordinate  system,  is  especially  in¬ 
teresting  and  should  be  Invsstignted  in  future  research. 


I  would  like  to  thank  Ed  Rise  man  and  A1  Hanson  for 
useful  comments  for  improving  this  paper.  I  am  also  in¬ 
debted  to  Brian  Burns  for  his  help  in  preparing  the  contour 
map*  in  Figure*  3.2  to  3  J. 


[ADIGSaj  G.  Adiv,  Determimn f  3-D  Motion,  and  Strnc- 
fare  from  Optical  Flow  Generated  bp  Several  Movinp 
Object*,  IEEE  Trans.  Pattern  AnaL  M aebina  Intel].,  vol 
PAMl-7,  pp.  284-401,  July  1085. 

[ADlSSbj  G.  Adiv,  Inierprttinp  Optical  Flow,  Pb.D.  Dis¬ 
sertation,  Computer  and  Information  Science  Dept., 
Univ.  of  Mass.,  198$. 

[FAN83a]  J.-Q.  Fang  and  T.S.  Huang,  Soloing  Thru  Di- 
menttonai  Stroll- Rotation  Motion  Equation*,  in  Proc. 
IEEE  Conf.  Computer  Vision  and  Pattern  Recognition, 
Washington,  D.C.,  1083,  pp.  2S3-258. 

[FANS3b|  J.-Q.  Fang  an.4  T.S.  Huang,  Eittmalinq  3-D 
Movement  of  a  Ripxd  Object:  Etpenmcutal  RetulU,  in 
Proc.  Lot.  Joint  Conf.  Artificial  Intell.,  Karlsruhe,  Ger¬ 
many,  1083,  pp.  1035-1037. 

[HAN78|  A.  Hanson  and  E.  Risemu  (Eds.),  'Computer 
Vision  Systems*,  Academic  Prexs  Inc.,  New  York,  NY, 
1978,  ?p.  303-334. 


411 


[LAW84]  D.T.  Lawton,  PnxtMaf  Dynamic  Image  Se¬ 
quences  from  a  Mom eg  Sr.ntor,  Fh.D.  Dissertation  (TR 
S4-0S),  Computer  sod  Informal-on  Sauce  Dept.,  Oar*,  of 
Min.,  1984. 

[LONfcO|  H.C.  Longuet- Higgins  and  K.  Prudsy,  Tkt  in¬ 
terpretation  of  a  Moving  Retinal  Ima go,  Proc.  Roy 
So;.  Load.,  B,  *oL  COS,  pp.  385-397,  July  1980. 

[LON81]  H.C.  Longest- Higgins,  A  Computer  Al/joritkm 
for  Reccnjti  acting  d  / Seen*  /rom  Two  Projection e,  Na¬ 
ture,  toL  293,  pp.  133-135,  Sep.  1981. 

[PRA80]  K.  Prasdny,  Egomotion  and  Rtlnttve  Depth  Mop 
from  Optical  Flow,  BioL  Cybernetic*,  voL  38,  pp.  87-102, 
1980. 

[KAL85]  A.  Ralston,  *A  Pint  Cou re*  in  Numerical  Anal- 
y*i*\  McGraw-Hill,  New  York,  NY,  1065. 

[RIE83J  J.H.  Rieger  and  D.T.  Lawton,  Determining  the 
Instantaneous  Axis  of  Translation  from  Optic  Flow  Gen¬ 
erated  bp  Arbitrary  Sensor  Motion,  in  Proc.  Workshop 
Motion:  Representation  and  Perception,  Toronto, 
Canada,  1983,  pp.  33-41. 

[ROA80]  J.W.  Roach  and  J.K.  Agguwal,  Determining  the 
Movement  of  Object:  from  a  Sequence  of  Inept*,  IEEE 
Trans.  Pattern  AnaL  Machine  IntelL,  roL  P AMI-3,  pp. 
554-582,  Not.  1980. 

[ YSA84)  R.Y.  Tsai  and  T.S.  Huang,  Unipneneee  end  Es¬ 
timation  of  Three-Dimensional  Motion  Parameters  rf 
Rigid  Object*  with  Caned  Serf  act*,  IEEE  Trans.  Pat¬ 
tern  AnaL  Marline  IntelL,  roL  P AMI-8,  pp.  13-27,  Jan. 
1984. 

[ULL79]  S.  U liman,  'The  Interpretation  of  Visual  Mo¬ 
tion*,  MIT  Press,  Cambridge,  Mate.,  1979. 

[ULL81]  S.  U lim an,  Analysis  of  Visual  Motion  by  Biologi¬ 
cal  and  Computer  System*,  Computer,  voL  14,  pp.  57-89, 
Aug.  1981. 

|WAX83]  A.M.  Waxman  and  S.  Ullman,  Surface  Struc¬ 
ture  and  3-D  Motion  from  Image  Flow:  A  Kinematic 
Approach,  CAR-TR-24,  Center  for  Automation  Research, 
Unir.  of  Maryland,  1983. 


V.-S 

yV< 

S.  1 

K* 

% 

& 

u 


V.  J 


4  i 


I  -st  . 


412 


.C9H 


Refinement  Of  Environmental  Deptli  Maps  Over  Multiple 

Frames 

Seraj  Bb&rw&ci,  Edward  Riaeman,  Allen  Hanson 
Computer  and  Information  Sciences  Bepartement 
University  of  Massachusetts 
rtjnherst,  Ma  01003 


'r--J  ♦ 


Abatmt 


Ic  this  paper  we  examine  the  taak  ct  courtrncting  a 
reliable  dep‘h  map  of  the  environment  fro  m  a  sequence  of 
image*  obtained  from  a  camera  undergoing  t relational 
motion.  Even  when  the  motion  of  the  camera  is  known, 
local  ambignitiee  occur  in  the  matching  of  features  from 
one  frame  to  the  next  leading  to  ambiguity  in  the  recov¬ 
ery  of  environmental  depth.  This  paper  first  examit.ee  the 
sources  of  error  in  computing  depth  and  then  propoeee 
mechanisms  to  obtain  coarse  estimates  of  depth,  predict 
displacements  and  refine  the  depth  map  for  selected  feature 
points.  The  technique  iteratively  improves  the  accuracy 
of  the  depth  estimates  over  a  eeqnence  of  frames,  while 
maintaining  constant  computational  limits  on  processing 
between  frames.  Doth  startup  and  updating  strategies 
follow  as  part  of  a  hierarchical  spatial  and  temporal  pro¬ 
cessing  paradigm.  The  results  of  a  preliminary  implemen¬ 
tation  are  presented  and  dircussed. 


1  Introduction 


The  process  of  recovering  structure  from  general  mo¬ 
tion  has  been  studied  by  several  researchers.  Some  have 
used  point  correspondence  and  the  rigidity  assumption  to 
recover  the  3-D  structure  |UUm83,TsaiS4),  some  have  com¬ 
bined  the  problem  of  measuring  motion  and  of  recover¬ 
ing  structure  using  global/local  search  terhniques[Lawt84], 
while  others  rely  on  the  properties  of  optical  floe  to  recover 
surface  structure  [Waxm63,Adiv85|.  A  p.-iblem  shared  by 
each  of  these  is  that  the  process  requires  more  computation 
lain  that  which  is  available  at  irame  rate  for  real  time  im¬ 
plementation.  In  addition,  almost  all  of  the  work  bas  been 
restricted  to  processing  only  two  image  frames. 

In  this  paper  we  are  concerned  with  the  problem  of 
recovering  depth  maps  over  multiple  frames  for  a  known 
translational  motion  of  *.  can*~~~  The  focus  of  expansion 
(FOE)  is  ‘be  point  where  the  axis  of  camera  translation 

'This  research  was  supported  by  the  US  Army  ETL  coder  costraet 
■  amber  DACA76-S5-C-0008  sad  by  Defease  Advsacrd  Research 
Projects  A(*acy  voder  ccatract  somber  N00014-32-K-04M 


intersects  the  image  plane.  The  displacement  path  is  de¬ 
fined  a*  the  path  along  the  line  in  the  image  connecting  the 
point  and  the  FOE.  For  a  camera  in  trrnslaticn,  posnii  t* 
the  tmefe  art  dupisced  efeay  (As  dufU ctmtttf  peik.  Given 
the  axis  of  translation  of  ths  camera,  the  depth  of  a  point 
can  be  determined  from  the  posit  ion  of  the  point  and  <  he 
extent  of  its  displacement  relative  to  the  focus  of  expansion 
(FOE)  or  tbs  focus  of  contraction  (FOC).  This  relation  is 
giver  by 


where  7i  i»  the  displacement  of  the  camera  along  the 
Z  axis  from  time  <  to  time  <  +  1,  Z  is  the  Z-coordiante 
(called  the  Uptk  of  the  point  relative  to  the  camera)  of 
rjt  •mvironmmtal  point  at  time  (  +  1,  D  is  the  distance 
of  the  ronespoading  image  point  from  the  FOE  or  FOC 
st  time  t,  .  ad  d  is  the  magnitude  of  the  displacement  of 
the  image  point  from  time  t  to  f  +  l'?all82].  Refer  to 
Figure  1  for  detail*  on  the  geometry  of  the  camera  and 
the  relation  of  the  camera  coordinate  system  (Y,  Z)  to  the 
world  coordinate  or  the  global  coordinate  system  (¥,  t). 
The  depth  of  a  point  can  thus  be  computed  from  equation 
(1)  if  the  magnitude  of  the  camera  translation  is  known  for 
the  time  interval  between  any  two  frames;  otherwise,  only 
relative  depth  can  be  computed  as  \  function  of  camera 
translation. 

For  schemes  that  nae  correlation  for  finding  correspon¬ 
dences  between  features,  a  common  cause  of  inconsisten¬ 
cies  in  tho  recovery  of  environmental  depth  of  the  surface 
patch  associated  with  the  feature  is  the  occurrence  of  no 
notches  for  festu.'cs  perhaps  due  to  occlusion  or  ambigu- 
"i  s  matches  resulting  from  an  inability  to  sufficiently  W al¬ 
ine  the  search  interval  for  finding  correspondence  in  subse¬ 
quent  frames.  Additionally,  there  is  a  limit  on  the  accuracy 
with  which  displacements  can  be  measured  given  the  limit 
ou  the  amount  of  computation  available  to  carry  rut  inter¬ 
polation  for  obtaining  subpixel  accuracy.  It  is  our  goal  to 
first  identify  the  eource  of  such  difficulties  in  obtaining  ac¬ 
curate  depth  maps.  Later  we  discuss  out  approach  to  the 
computation  of  depth  maps  which  is  based  on  the  aseump- 
tion  that  accurate  construction  of  environmental  depth  is 
a  gradual  pr  :e»r  requiring  only  coarse  estimates  of  depth 
in  the  early  stages  of  processing  (i.e.  the  ’start-up’  phase 


41  ■ 


Figure  1:  The  camera  geometry  and  the  coordinate  system. 
Y,Z  »  the  camera-certered  coordinate  eyttem  and  if,  £  ie 
the  world  coordinate  eyetem. 


where  no  depth  mape  initially  exist).  These  estimates  can 
then  be  iteratively  updated  over  several  frames  using  finer 
resolution  matching  in  a  prediction  and  refinement  scheme. 

2  Representation 

Currently  our  representation  of  the  world  map  involves 
the  representation  of  depth  of  a  set  of  feature  points  in  the 
image.  Good  candidates  for  feature  points  are  those  that 
ere  likely  to  have  a  unique  match  such  as  points  cf  high  cur¬ 
vature  along  image  contours,  etc.  Points  with  such  on  qua 
local  characteristic!  could  be  selected  by  applying  an  inter¬ 
est  operator|Kitc80,Mora77].  Once  the  features  have  been 
selected,  they  are  tagged  with  a  symbolic  identifier.  The 
key  information,  however,  is  that  each  feature  point  cajries 
with  it  a  list  that  contains  the  following  information: 

•  the  s  and  y  coordinate  of  the  point  in  the  image; 

•  the  depth  estimate  of  the  point  from  the  previous 
time  step 

•  the  expected  accuracy  of  the  depth  estimate  from  the 
previous  time  step 

In  some  cases  it  may  be  desirable  to  maintain  a  longer 
processing  history.  In  this  esse,  the  information  attached 
to  each  feature  point  becomes  a  list  of  lists. 

The  depth  computation  in  our  framework  is  always 
performed  with  respect  to  the  camera -centered  coordinate 


system.  Since  the  camera  translates  J*  through  three- 
dimensional  space,  all  predictions  from  the  previous  time 
step  t  —  1  have  to  be  transformed  to  be  consistent  with 
the  origin  of  the  current  coordinate  system  at  ime  t.  Sim¬ 
ilarly,  any  comparisons  in  depth  for  veriflcc-xin  would  also 
require  a  transformation.  If  ths  camera  continues  to  move 
at  constant  speed  then  the  mersurements  from  N  frames 
previously  will  need  to  have  their  «  coordinate  modified 
by  N  ■  Tg  in  order  to  compare  them. 

3  Errors  In  Depth  Measurement 

Errors  in  depth  measurement  from  translational  motion 
arise  from  several  sources  and  the  effect  of  each  nut  be 
understood.  A  brief  list  of  such  sources  is  given  below: 

•  Errors  due  to  incorrect  match  along  a  displacement 
path  car  produce  arbitrary  image  displacement  error 
and  therefore  arbitrarily  large  depth  error, 

•  No  match  (i.e.  lack  of  an  acceptable  match)  might 
occur  due  to 

1.  an  insufficiently  large  search  area  along  the  dis¬ 
placement  path  for  the  match; 

2.  occlusion  of  tbe  feature  :’n  the  next  frame; 

3.  noise,  highlights,  shadows  and  surface  distor¬ 
tion. 

•  An  error  in  the  FOE  which  cauae*  an  incorrect  die- 


«) 


placement  path  for  most  points  iu  the  image  (  this 
source  of  error  is  assumed  to  be  minimal  in  ‘'his  pa¬ 
per). 

•  The  accuracy  in  measured  displacements  Is  limited 
by  the  discrete  nature  of  the  matching/correlation 
(search)  process;  i.e.,  only  a  discrete  set  of  window 
P‘.‘*i  t  ions  are  checked  for  possible  matches.  The  search 
interval  is  the  distance  between  these  points  and  the 
displacement  accuracy  corresponds  to  this  interval  as 
shown  in  Figure  2. 

The  prediction-guided  depth  .eCnement  process  that 
we  discuss  later  in  this  paper  focuses  upon  some  of  these 
types  of  error.  In  particular,  the  accuracy  in  the  compu¬ 
tation  of  displacement  between  one  pair  of  frames  can  be 
used  to  limit  the  interval  within  which  to  search  for  the 
displaced  point  in  subsequent  trames. 

The  relationship  between  the  d'splacenent  accuracy 
and  the  depth  accuracy  can  be  obtained  as  follows.  Let 
Sd  be  the  magnitude  of  the  accuracy  in  measuring  the  dis¬ 
placement  of  the  point  in  the  image  and  let  SZ  be  the 
corresponding  accuracy  in  depth  of  the  point.  Since 


path. 

Z,  =  Tt  D,/d,  (2) 

where  Zt  and  Dt  are  the  depih  and  the  distance  of  the 
point  from  the  FOB  respectively  and  d,  is  the  displacement 
of  the  point  from  time  t  -  1  to  time  J.  By  Differentiating 
(2)  with  respect  to  d,  results  in: 

=  i^MHi  (3) 

For  a  given  d ,,  we  can  substitute  lor  it  in  (3)  to  obtain 


\SZl\  =  \^Tl\Sd,\ 

Hence,  for  obtaining  accurate  depths,  measuring  dis¬ 
placements  accurately  becomes  more  critical  the  farther 
the  point  is  from  the  camera.  In  actual  situations,  one 
could  expect  the  camera  to  be  in  constant  motion,  with 
image  frames  acquired  at  some  fixed  temporal  interval.  In 
these  situations,  the  process  of  computing  environmental 
depths  can  be  viewed  as  an  iterative  refinement  process, 
characterised  by  measurement  of  depth  followed  by  predic¬ 
tion  of  image  displacement  and  refinement  of  displacement 
and  depth  estimates.  Since  we  have  an  estimate  of  the  ac¬ 
curacy  in  computed  depth  at  time  t,  we  can  compute  the 
expected  accuracy  in  displacement  at  t  + 1  from 

N.+.l  =  l%r^M^.+.l  (s) 

"»+ 1 

where  &,+,  and  i+,  are  the  expected  depth  and  the 
expected  accuracy  in  depth  respectively  at  time  t  +  1.  The 
expected  depth  at  time  t  +  i  is  given  by 

£i+i  —  Z,  —  Tt  (6) 

so  that 

I«*hiI  =  l«.l 

if  we  assume  that  the  translation  of  the  camera  Tt  is  known 
accurately. 

4  Computing  Depth  Without  Pre¬ 
diction 

A  block  diagram  of  the  system  used  :a  this  and  succeed¬ 
ing  sections  is  shown  in  Figure  3.  The  diagram  represents 
two  systems,  one  without  prediction  between  frames  and 
the  other  with  prediction  between  frames  (the  muliitrame 
algorithm).  The  point  tracking  module  is  common  to  both 
and  is  a  correlation-based  matching  algorithm  which  op¬ 
erates  on  two  successive  frames:  an  image  at  time  t,  and 
another  at  time  tt.  A  3  x  3  window  o i  data  around  each 
feature  point  is  moved  in  discrete  steps  along  the  displace¬ 
ment  path  and  correlation  values  between  it  and  the  corre¬ 
sponding  windows  in  the  second  frame  are  obtained.  The 
window  location  which  results  in  the  highest  correlation 
is  assumed  to  correspond  to  the  displaced  location  of  the 
feature  point  in  the  second  frame.  From  the  computed  dis¬ 
placement,  the  depth  of  the  point  in  the  second  frame  is 
computed. 

If  we  know  the  ground  truth  depth  then  we  can  com¬ 
pute  the  depth  error  at  the  point  by  computing  the  abso¬ 
lute  difference  between  the  computed  depth  and  the  ground 
truth  depth.  For  this  reason,  the  experiments  described 
later  in  this  section  and  in  section  5  used  a  s~t  of  synthetic 
images  generated  by  a  ray  tracing  algorithm]  Whit60|.  Syn¬ 
thetic  images  were  used,  as  opposed  to  an  actual  world 


41 


Pndictioa 


scene,  bee 

•  the  images  generated  by  thr  r»j  tracing  algorithm 
have  characteristics  very  much  like  those  ot  rear  im¬ 
ages  and  include  shadows,  red  tetions  and  transparency 

e  the  camera  geometry  and  the  parameters  for  camera 
translation  appropriate  for  onr  experiments  were  eas¬ 
ily  controlled  during  the  imai(s  generation  process; 

e  the  ground  truth  depth  map  representing  the  actual 
depth  of  the  points  from  the  focal  point  of  the  cam¬ 
era  can  be  couveniertly  obtained  and  compared  with 
the  depth  map  computed  using  the  multiframe  algo¬ 
rithm; 

Figure  4  is  a  typical  picture  of  a  256  x  256  image  gen¬ 
et  ated  by  the  ray  tracing  algorithm.  The  camera  is  trans¬ 
lated  0.2  cm.  along  the  optical  axis  between  any  pair  of 
consecutive  frames.  The  optical  axis  of  the  camera  is  at 
-15*  relative  to  the  poeitrre  £-axis,  and  the  translation 
of  the  camera  causee  points  to  be  displaced  by  leas  than  7 
pixels  in  the  following  image.  H~uce  we  use  a  search  inter¬ 
val  of  10  pixel*  (a  conservative  estimate)  when  searching 
for  correepondence  along  the  displacement  path. 


Figure  4:  256  x  256  'mage  generated  by  the  ray  tracing 
algorithm.  The  vectors  are  unit  vectors  at  «*c,\  interest 
point  along  the  displacement  path. 

4.1  Experiments 

Several  experiments  using  a  sequence  of  five  synthetic 
images  labelled  0  through  4  have  been  performed  to  evalu- 


ate  the  magnitude  and  source  of  error*  in  recovering  depth 
map*  over  a  sequence  of  frame*.  In  the  first  exper¬ 
iment  depth  ia  computed  using  overlapping  pair*  of  con¬ 
secutive  frames  without  predicting  feature  displacement* 
between  frames.  Hence,  in  the  representation  discussed  in 
an  earlier  section,  we  only  carry  tlr  i  and  y  location  of 
the  points  from  one  frame  to  the  next.  Therefore,  each 
processing  rlep  of  the  algorithm  consists  of  finding  corre¬ 
spondences  for  each  point,  saving  the  s  and  jr  location*  of 
each  point,  computing  the  depth,  and  computing  the  error 
in  depth. 

Result*  of  the  experiment  at  each  processing  step  and 
for  each  feature  point  are  shown  in  Table  1.  The  table 
indicates  that  the  error  in  computed  depth  fluctuate*  from 
one  step  to  the  next.  The  trend  is  generally  upward  and 
accumulates  to  high  values  for  a  majority  of  the  points  by 
time  tt  when  processing  terminates.  This  is  to  be  expected 
since  errors  in  the  location  of  the  feature  points  have  a 
tendency  to  accumulate. 

In  the  second  experiment  we  observe  the  depth  com¬ 
putation  from  frames  that  are  separated  by  laiger  camera 
translations.  Results  from  this  experiment  are  shown  in 
Table  2.  We  find  that  for  'he  same  camera  translations, 
the  errors  in  the  present  case  are  smaller  than  those  in  the 
previous  experiment. 

As  expected,  the  results  from  the  second  experiment 


feature 

true  depth 
(cm.) 

err.  in  comp,  depth  in  (cm.) 

0-1 

■a 

2  3 

3-4 

1 

13.505 

IiltW 

1.698 

2 

13.132 

4.459 

ETTE1 

5.271 

5.665 

11.970 

1.701 

1.313 

1.118 

11.633 

1.753 

1.428 

4.171 

0.776 

5 

11.833 

1.511 

1.319 

1.127 

0.934 

S 

9.902 

4.559 

4.721 

4.860 

5.693 

7 

2.480 

9.127 

3.327 

8 

2.990 

3.341 

0.798 

9 

b.951 

3.017 

HSU 

2.198 

11.534 

10 

8.951 

2.951 

2.529 

2.110 

8.951 

Table  I:  Results  from  the  first  experiment.  In  this  exper¬ 
iment.  only  the  positions  A  the  feature  points  computed 
from  a  pair  of  frames  (e.g.  0- 1)  are  carried  over  to  the  next 
pair  of  frames  (1-2).  Errors  in  position  are  cumulative,  re¬ 
sulting  in  large  errors  in  the  computed  depth  of  the  feature 
points,  as  shown  in  the  last  column. 

show  an  improvement  over  those  from  the  first  one.  There 
are  two  reasons  for  this  improvement: 

•  the  possibility  of  cumuiative  tracking  errors  r-<as  ei¬ 
ther  eliminated  (a*  in  the  case  of  fl-‘i,0-3  and  0-4) 
since  only  two  frames  were  used  or  was  rrunimired 
f  as  is  the  case  of  0-2-4)1; 

♦  la>-ge  camera  translations  give  large  displa-ements  for 


feature 

_ 

true  depth 
(cm.) 

err.  in  comp,  depth  in  (cm.) 

ihqiiksb 

0-3 

msm 

i 

13.505 

0.898 

2 

13.132 

1.123 

0.381 

C.229 

3 

1  i  .970 

1.508 

0.720 

1.312 

1.118 

4 

s  i.C33 

0.084 

0.241 

5 

11.633 

1.319 

0.538 

1.127 

0.934 

6 

9.9C2 

4.816 

1.288 

2.188 

7 

9.902 

1 121 

0.312 

0.251 

1.345 

8 

9.902 

3.244 

0.416 

0.131 

1.304 

9 

8.951 

2.794 

1.201 

0.386 

10 

8.351 

2.510 

mm 

Table  2:  Results  from  the  second  experiment.  The  table 
shows  the  '**-•'■  t*  for  three  separate  computations.  The 
first  two  mu*  (0-2,  2-4)  are  a  two  step  computation 
similar  to  ins*  *  Table  1,  except  that  frames  1  and  3 
were  e limin’  suiting  in  larger  displacements  of  the 

feature  poinT.  last  two  columns  used  only  two  pair* 
of  frame*  each  ; .  ,0-4). 


points  ia  the  image  and  consequently  lower  relative 
errors  in  displacements. 

These  experiments  identify  some  of  the  sources  of  er¬ 
ror*  which  we  bad  originally  expected.  First,  the  increase 
in  the  errors  in  computed  depth  is  a  result  of  the  difficulty 
in  successfully  tracking  points  over  a  sequence  of  frames. 
Small  errors  in  locating  a  match  at  each  step  are  com¬ 
pounded  over  several  proers  mg  steps.  A  consequence  of 
matching  errors  is  errors  :i  displacements  which  in  turn 
cause  errors  in  computed  depths. 

Second,  displacement  can  only  be  measured  to  within 
±  \  of  the  resolution  of  the  seaich  for  matching.  Hence,  the 
spatial  resolution  of  the  search  limits  the  accuracy  with 
which  depth  can  be  computed.  Effects  on  accuracy  due 
to  noise  are  not  applicable  in  our  case  because  we  use  a 
controlled  set  of  images  [however,  see  Pavl85|. 

We  believe  that  both  the  tracking  and  the  matching 
problem  can  be  addressed  effectively  by  constraining  the 
search  interval  (for  matching)  with  dup'b  information  avail¬ 
able  from  the  preceding  frames.  In  the  next  section  we 
discuss  an  approach  which  requires  only  coarse  estimate? 
of  depth  from  the  early  stagrs  of  processing  and  then  re¬ 
fines  it  over  a  sequence  of  frames  by  searching  at  a  finer 
resolution. 

'0-2-4  stands  for  an  experiment  in  which  J-pth  was  Best  computed 
by  using  frames  0  and  2  and  then  using  the  new  locations  of  the 
points  m  fit.  ne  2.  depth  was  computed  from  frames  2  end  4.  This 
is  different  from  0-4  which  reprereois  depth  computations  directly 
from  frames  0  and  4.  both  the  expenmrnts  compute  the  dep'h  of 
the  points  in  frame  4.  yet  the  results  from  each  could  be  different. 


•V-V-\a 

■yy-y 

• .n  « 


L-JSL 


417 


5  Prediction  And  Refinement 

To  recover  depth  map*  of  the  environment  reliably  over 
a  sequence  of  frames,  it  is  important  to  reduce  the  possi¬ 
bility  of  matching-  errors.  An  improvement  in  matching 
is  expected  to  positively  affect  the  tracking  problem.  We 
have  been  using  the  following  constraints  to  e*arch  for  the 
displaced  points: 

«  points  can  only  move  along  the  displacement  path: 

•  points  in  the  image  can  move  no  more  than  a  fixed 
number  of  pixels  between  a  pair  of  frames. 

In  addition,  the  following  observation  can  be  used  to  im¬ 
prove  the  matching: 

•  giver  a  current  estimate  of  the  depth  of  a  point  and 
its  accuracy,  the  displacement  of  the  pou.  I  in  the  next 
step  can  be  predicted  to  tall  vithin  an  interval  along 
the  displacement  path. 

In  the  start-up  situation,  the  system  has  no  prior  depth 
information.  This  situation  couid  arise  either  when  a  new 
set  of  images  is  input  to  the  system  or  when  the  scene 
changes  drastically,  as  in  the  case  of  turning  around  a  cor¬ 
ner.  In  this  case,  since  the  interval  of  search  is  large  (be¬ 
cause  the  accuracy  is  small),  we  minimise  computation  by 
searching  at  a  ccarse  spatial  resolution  to  obtain  the  depth 
estimates.  Subsequent  processing  can  use  these  estimates 
to  reduce  the  search  interval  and  conduct  search  at  fine 
spatial  resolution. 

Once  the  depth  map  is  determined  accurately,  any  pro¬ 
cessing  thereafter  can  he  characterised  by  a  similar  predictioi 
refinement  cycle  which  can  be  aided  by  concurrent  pro¬ 
cessing  at  lower  temporal  resolutions.  By  processing  with 
frames  separated  by  large  time  intervals,  the  depth  compu¬ 
tations  at  frame-rate  (which  defines  the  maximum  tempo¬ 
ral  resolution  )  can  be  verified  and  corrected  (if  necessary). 


5.1  Experiment* 

In  this  section,  we  describe  preliminary  experimental 
result  from  the  multifraoe  algorithm.  In  this  algorithm, 
the  information  carried  forward  from  the  analysis  of  frames 
U  ud  {,+*  (k  >  1)  includes  not  only  the  predicted  posi¬ 
tions  of  the  feature  points  but  also  the  depth  of  the  fea¬ 
ture  points  as  well  as  the  expected  accuracy  of  the  depths. 
From  the  computed  depths,  a  bound  on  the  expected  dis¬ 
placements  of  the  feature  points  in  frame  (/  >  k)  is 
computed  and  used  to  limit  the  local  search  for  the  maxi¬ 
mum  of  the  correlation  function. 

An  image  sequence  of  8  images  labelled  0  through  7  was 
generated  for  this  experiment.  The  camera  translation  be¬ 
tween  frames,  the  optical  axis,  and  the  axis  of  translation 
are  identical  to  the  ones  specified  for  the  previous  exper¬ 
iment.  To  incorporate  temporal  resolution,  we  choose  to 
use  only  frames  0,4,6  and  7,  which  represents  an  increase 
in  temporal  resolution  by  a  factor  of  two  at  every  step 
after  the  first  one.  Also,  in  order  to  improve  the  accu¬ 
racy  of  depth  computation  by  a  factor  of  two,  the  spatial 
resolution  of  the  searcL  •#  1  pixel  in  the  first  step  and  is 
increased  thereafter  by  a  factor  of  four  for  every  subse¬ 
quent  step.  The  processing  steps  fer  the  experiment  can 
be  described  briefly  as  follows: 

1.  compute  depth  from  image  frames  0  and  4  by  col 
ducting  a  search  at  1  pixel  resolution  to  determine 
displacements; 

2.  use  the  depth  estimates  from  step  1  to  predict  the 
displacements  in  frame  6; 

3.  repeat  step  1  with  frames  4  and  6,  bat  conduct  search 
at  1/4  pixel  reeolution; 

4.  use  the  depth  estimates  from  step  3  to  predict  the 
displacements  in  frame  7; 


feature 

true  depth 
(cm.) 

expected  accuracy  in  depth  (i  cm.) 

err.  in  comp,  depth  (cm.) 

0-4,  1  pix 

4-6,  J  pix 

fl-7,  pi* 

3-4 

4-6 

6-7 

1 

12.923 

0.924 

0.360 

9.159 

0.816 

0.049 

0.278 

2 

13.608 

1.776 

0.639 

0.333 

1.098 

0327 

0307 

3 

13.510 

1.423 

0.533 

0.231 

1.522 

0356 

0.123 

4 

12.387 

2.617 

1.866 

0.676 

1.116 

1.780 

0.243 

5 

12.256 

1.304 

0.710 

0.294 

1.025 

0.23S 

0388 

6 

10.239 

6.770 

0.293 

0.128 

1.346 

0.722 

0.460 

7 

10.239 

1.047 

0.378 

0.160 

l.oiTI 

0.212 

0.121 

8 

10.327 

3.744 

1.107 

0.436 

5.450 

2?727 

1.777 

9 

9.366 

1.169 

0.396 

0.164 

0.78a  j  C.100 

9.444 

10 

8.350 

0.436 

0.162 

0.075 

0.301  [  0.038 

0.109 

Table  Results  from  the  third  experiment  using  predic¬ 
tion  of  the  displacements  of  the  feature  points  from  frame 
to  frame  to  constrain  the  local  search.  See  text  for  details. 


5.  repeat  step  1  with  frame*  6  and  7,  bat  conduct  the 
search  at  1/16  pixel  resolution 

The  result*  of  the  experiment  are  shown  in  table  3.  A 
eetof  10  interest  point*  were  selected  tor  recovering  a  depth 
map  over  time.  Values  under  the  heading  Expected  Ac¬ 
curacy  la  Depth  are  those  that  were  computed  (using 
equation  (3))  from  the  corresponding  accuracy  in  measur¬ 
ing  displacement*  at  >  given  spatial  resolution  of  search. 
Ideally,  the  error  in  computed  depth  should  fall  within  the 
bounds  of  the  expected  accuracy  in  depth  measurement  at 
each  step  of  the  refinement  process.  We  find  this  to  be 
true  for  all  the  error  values  in  step  1.  At  step  3  ws  find 
that  the  computed  depth  of  80%  of  the  points  is  within  tbs 
expected  accuracy  and  this  figure  drops  to  70%  for  step  3. 
Even  though  the  actual  error  reduces  from  step  I  to  step  3, 
the  rate  of  reduction  in  error  for  sous  points  after  the  first 
step  is  less  than  the  rate  of  expected  increase  is  accuracy 
(in  depth)  at  higher  spatial  resolutions  of  search. 

The  observed  discrepancies  could  -asnlt  from  the  viola¬ 
tion  ot  the  following  assumptions  which  are  implicit  in  tor 
matching  approach: 

•  matching  is  dons  around  the  global  maximum  at  the 

correlation  function; 

•  the  correlation  function  decreases  monotonically  at 
points  away  from  tbs  global  maximum; 

•  the  global  maximum  can  be  measured  with  high  ac¬ 
curacy; 

If  the  gbbal  maximum  of  the  correlation  function  cannot 
be  measured  with  high  accuracy,  doing  search  at  a  high 
resolution  will  find  only  local  peaks  in  tbs  function  sad 
would  not  increre*  the  accuracy  ir,  computed  depth.  Study 
of  the  chape  of  the  correlation  fanetbn  in  order  to  under¬ 
stand  the  limit*  on  achievable  accuracy  is  Iht  subject  of 
future  work. 

0  Future  Work 

In  this  paper  we  bav«.  identified  eeveral  sources  of  er¬ 
rors  in  recovering  environmental  depth  maps  from  known 
translational  motion  using  multiple  frames.  We  hare  also 
demonstrated  through  experiments,  the  role  of  prediction 
and  refinement  in  dealing  with  the  start-up  problem  and 
the  problem  of  tracking  points  over  a  sequence  of  frames. 
These  are  preliminary  studies  only  and  much  work  remains 
to  be  done.  Tbe  following  areas  in  particular  require  more 
investigation. 

1.  Tbe  assumptioi  that  th#  global  maximum  of  tb* 
correlation  function  has  a  sharp  peak  and  decreases 
monotonically  at  points  away  from  the  peak  is  not 
valid  at  all  points  in  an  image.  We  intend  to  obtain 
a  be-.Ur  characterisation  of  th*  shape  of  tb*  correla¬ 
tion  function  in  order  to  determil  i  the  accuracy  with 


which  the  maximum  of  the  function  can  be  localised. 
Such  an  analysis  will  identify  a  spatial  resolution  of 
search  which  is  best  suited  for  the  scale  of  the  inten¬ 
sity  variation  at  a  point  in  the  image. 

2.  We  intend  to  explore  the  possibility  of  representing 
the  image  frames  at  multiple  spatial  resolutions  in 
order  to  imp-rave  the  matching  scheme  without  sacri¬ 
ficing  computational  efficiency.  In  order  to  recognise 
low  frequency  variations  in  ibe  image,  it  is  necessary 
to  ns*  large  correlation  windows  which  icsdj  to  in¬ 
crease  computation.  Dy  representing  the  image  at 
a  lower  spatial  resolution  and  using  a  smaller  corre- 
laiioa  window  for  mate  hing[Gla*83,Ajiau34i ,  signif¬ 
icant  improvements  in  the  computational  effort  can 
be  realised.  An  implementation  o i  tie  multiframe 
algorithm  is  possible  which  processes  image  frames 
at  multiple  spatial  resolutions  at  each  time  step  to 
obtain  greater  accurs.’  in  matching  and  in  tracking 
feature  point*  over  mnkiple  frames. 

3.  We  also  intend  to  investigate  the  possibility  of  inte¬ 
grating  depth  information  from  processes  at  different 
temporal  resolution*.  A*  discussed  earlier,  there  are 
definite  advantages  in  projecting  the  deptn  estimates 
from  procaases  at  low  temporal  resolutions  to  those 
at  high  temporal  resolution*. 

4.  We  shall  soon  be  evaluating  th*  performance  of  tbs 
mullifram*  algorithm  on  sequences  a*  images  repre¬ 
senting  scenes  from  comdon  and  hallways  inaid*  a 
building  as  well  sa  road  aetata  which  are  similar  to 

those  required  for  the  ALV  (Autoeomoos  Land  Ve¬ 
hicle)  project. 

References 

[AdivIS]  Adi*,  G.,  Determining  3-D  Motion  and  Structure 
from  Optical  Plow  Generated  by  Several  Moving 
Objects,  COINS  Teckmcnl  Report  84-07  University 
of  Massachusetts,  April  1064. 

[Aoan84]  An  an  dan,  P.,  A  Confidence  Measure  for  Corre¬ 
lation  Matching,  Proe.  of  DARPA  Image  Under  - 
it  ending  Workshop,  New  Orleans,  LA,  1064. 

[Balls},'  Ballard,  D.H.  and  C.M^Brown,  Computer  Vision, 
Prentice  Ball,  Inc.,  106$ 

[GlasS3]  Glaser,  F.,  G.  Reynolds  and  P.  Anandan,  Scene 
Matching  by  Hierarchical  Correlation,  IEEE  CVPR 
'.onferenee,  June  1063,  pp.  432-441. 

[KitcS0|  Kitchen,  L.  sod  A.  Rosenfeid,  Gray-Level  Cor¬ 
ner  Detection,  TA-88 7,  Computer  Science  Center, 
llurv.  of  Maryland,  April,  1060. 

[Law  184 1  Lawton,  D.T.,  Protesting  Dynamic  Image  Se¬ 
quence  from  a  Moving  Sensor,  P!i.D.  Dissertation 


4  10 


(TR  84-05)  Computer  and  Information  Science  Dept. 
Cnnr.  of  Mast.(1984) 

[Mor»77]  Moravec,  H.P.,  Toward*  Automatic  Visual  Ob¬ 
stacle  Avoidance,  Proc.  of  tie  Slk  UCAl,  MIT, 
Cambridge,  MA,  1077. 

[T«ai84]  Tsai,  R.Y.  and  T.S.  Huang,  Uniqueness  and  Es¬ 
timation  of  Three-Dimensional  Mot-on  Parameters 
of  Rigid  Objects  with  Curved  Surfaces,  PAMl  0 
(1984),  pp.  13-27. 

o 

[UHm83|  Uliman,  S.,  Maximising  Rigidity:  The  Incremen¬ 
tal  Recovery  of  3-D  Structure  From  Rigid  and  Rub¬ 
bery  Motion,  AJ.  Memo  No.  7 tl,  MIT  A1  Lab, 
Jane,  1083 

[W*xm83|  Wsxman,  AM.  and  S.,  UQman,  Surface  Struc¬ 
ture  and  3-D  Motion  from  Image  Flow:  A  Kine¬ 
matic  Approach,  CAR-TR-tf,  Center  for  Automa¬ 
tion  Research,  Unhr.  of  Maryland  (1083) 

[Whit80]  Whitted,  T.,  An  improved  Illumination  Model 
for  Shaded  Display,  CommmUeoUont  of  tie  ACM, 
23(9),  Jane  1080. 

(Pavl8S|  Pavlin,  L,  Analysis  of  an  Algorithm  fee  Detection 
of  Translational  Motion,  n  ties*  free  'edtm.fi. 


Acknowledgements 

The  authors  want  to  thank  P.  An  and  as  for  many  help¬ 
ful  suggestions  and  comments  throughout  tbs  conns  of  this 
work.  They  also  thaaV  L  Pavlin  for  his  image  generator, 
and  R.  Heller,  R.  delkxup,  B.  Bums  and  M.  Snyder  for 
their  help  on  several  occasions. 


420 


GCQIOMS  -Q& 


P: 


MULTIRESOLUTION  PATH  PLANNING 
FOR  MOBILE  ROBOTS 

Subbarao  Kam&ham  aati 
Larry  S.  Davis 

Computer  Vision  Laboratory 
Center  for  Automation  Research 
University  of  Maryland 
College  Park,  MD  20742 


ABSTRACT 

The  problem  of  automatic  collision-free  path  plan¬ 
ning  is  central  to  mobile  robot  applications.  In  this 
report,  we  present  at.  approach  to  automatic  path  plan¬ 
ning,  based  on  a  quadtree  representation.  We  introduce 
hierarchical  path  searching  methods,  which  make  use  of 
this  multiresolution  representation,  to  speed  up  the  path 
planning  piocess  considerably.  Finally,  we  discuss  the 
applicability  of  this  approach  to  mobile  robot  path  plan¬ 
ning. 

1.  INTRODUCTION 

The  problem  of  automatic  collision-free  path  plan¬ 
ning  is  central  to  mobile  robot  applications.  Path  plan¬ 
ning  for  mobile  robots  is  in  many  ways  different  from  the 
more  familiar  case  of  path  planning  for  manipulators  (see 
also  the  discussion  in  [Thorpe84j  ).  For  example: 

(I)  A  mobile  robot  may  have  only  an  incomplete  model 
if  its  i  tivironnient.  p<  rhaps  because  it  construct  *  this 
model  using  vision  and  thus  cannot  determine  what 
is  ocrlttded  hv  an  object. 

A  mobile  rols>t  will  ordinarily  negotiate  any  given 
path  only  once  (as  opposed  to  u  manipulator  which 
might  perform  the  same  task  thousands  of  times). 
This  implies  that  it  is  more  important  to  develop  a 
negotiable  path  quickly  than  it  is  to  d-velop  an 
"optimal”  path,  which  is  usually  a  e<  .1  v  ;  eralion. 
(it)  A  mobile  rolmt  will  be  moving  accord.' ■%  ■  •  a  previ¬ 
ously  computed  path  while  it  is  computing  an  exten¬ 
sion  or  modification  to  that  path:  also  the  path 
being  developed  nav  change  as  more  information 
al.out  the  environment  is  discovered.  Thuv  path 
planning  in  mobile  roliots  is  a  continuous  online 
process  rat  ber  than  a  -ingle  olfline  process  as  in  the 
e.isi  .  f  a  manipulator. 

<  intentional  path  planning  algorithms  can  be 
divided  broadly  into  two  categories.  In  the  first  category 
.re  the  methods  which  make  trivial  (if  any)  changes  10 


the  representation  of  the  image  map  before  planning  a 
path.  The  regular  grid  search  [Thorpe84]  and  verteg 
graph  methods  (Morave-81)  [Thompson77j  [Nilsson69|  fall 
into  thb  category. 

Though  these  methods  keep  the  representational  cost, 
to  a  minimum,  their  applicability  to  mobile  robot  naviga¬ 
tion  is  limited.  For  exampie,  the  regular  grid  search  is 
[Wallace83]  |Thorpe84]  “too  local”  and  its  path  planning 
cost  increases  with  grid  sire  rather  than  with  the  number 
of  obstacles  present.  Further,  both  regular  grid  search 
and  vertex  graph  methods  generate  paths  which  clip  obs¬ 
tacle  comers. 

The  methods  in  the  second  category  make  elaborate 
representation  changes  to  convert  to  a  representation 
which  is  easier  to  analyze  before  planting  the  path.  Free 
space  methods  (Brooks83a|.  medial  axis  transform 
methods,  Voronoi  methods,  etc.,  fall  into  this  category. 
A  potential  practical  shortcoming  of  such  methods  for 
mobile  robot  navigation  is  that  the  path  planning  cost  is 
still  very  high  because  of  the  representation  conversion 
process  involved. 

Though  the  above  two  categories  by  no  means 
exhaust  the  existing  methods  (there  are  configuration 
space  methods  that  use  a  vertex  graph  approach 
[Lo:ano79j  and  others  that  use  a  free  space  approach 
[LozanoMl |  to  solve  the  manipulator  findpath  problem), 
they  do  point  out  that  what  mobile  robots  need  may  be  a 
compromise  between  these  two  categories. 

It  is  these  considerations  that  motivated  the  in  til* 
t: resolution  ( hieraarhiral)  representation  based  path  plan¬ 
ning  algorithms  described  in  this  report  (s-e  also 
iDavisHaj).  Similar  considerations  also  led  to  the  use  of 
hierarehieal  representations  in  manipulator  “findpath" 
problems  (see  Section  4  for  a  discussion  of  related  work). 
In  this  report,  we  first  develop  a  method  of  path  planning 
for  mobile  robots  using  a  hierarchical  representation 
based  on  quadtrees  and  then  describe  staged  search  as  a 
way  of  exploiting  the  hierarehieal  nature  of  the  represen¬ 
tation  to  gain  substantial  computation;,'  savings. 
Throughout  this  report  we  restrict  our  attention  to  two- 


I'!,,  at  the  Defense  Vtsaneed  Research  Projects  Agency  and  t  s  \rniy  Night  \  Non  and 

l.lectro  optics  l.alKiratory  under  Contract  D.AAK7C  K  IWI*  i«  grxiefullt  .acknowledged 


4  21 


S 

3 


i 

t 

w* 

w* 

V 
w* 

V 

i 


t 


dimensional  path  planning  without  rotation  and  a  vehicle 
with  circular  cross-section. 

Section  2  develops  a  quadtree  planning  algorithm 
based  on  A*  search.  Section  3  presents  a  staged 
(hie'-irchical)  path  planning  algorithm  which  has  compu¬ 
tational  advantages  as  compared  to  the  puce  A*  search 
on  quadtrees.  The  staged  search  involves  inclusion  of 
gray  nodes  in  the  search.  Section  4  discusses  related 
work  and  finally,  Section  5  summarizes  the  conclusions 
reached  from  this  research  and  discusses  future  directions. 
In  the  remainder  of  this  section  we  define  some  terms 
used  in  these  discussions. 

Quadtree  related  terminology:  A  quadtree  is  a 

recursive  decomposition  of  a  2  D  pic' ure  into  uniformly 
colored  2'  X2‘  blocks  (e.g.,  see  Figure  2.1)  |Samet83j.  A 
node  of  a  quadtree  represents  squa-e  region  of 

the  picture.  A  free  node  of  a  quadtree  is  a  node  of  the 
quadtree  representing  a  region  of  freespace.  An  obstacle 
node  is  a  node  repres-nting  a  region  of  obstacles.  A  gray 
node  is  a  node  representing  a  region  having  a  mixture  of 
freespace  and  obstacles.  A  leaf  node  of  a  quadtree  is  a  tip 
node  of  the  tree.  In  ordinary  quadtrees,  leaf  nodes  are 
always  obstacle  nodes  or  free  nodes,  but  in'  pruned  quad¬ 
trees  (see  below),  they  may  also  be  gray  nodes.  For  any 
gray  node  G.S  (G  )  denotes  the  subtree  rooted  at  <7. 
L[G )  denotes  the  number  cf  leaf  nodes  in  5(G).  The 
gray  content  of  a  gray  node  G  is  defined  as  the  number 
of  obstacle  pixels  in  the  region  represented  by  G,  and  the 
grayness  ot  G  is  the  percentage  of  obstacle  pixels  in  that 
region. 

A  pruned  quadtree,  Qt ,  of  a  quadtree,  Q ,  is  gen¬ 
erated  by  making  some  of  the  gray  nodes,  G, ,  into  leaf 
nixies,  thus  pruning  the  subtrees,  5(G,),  rooted  at  the 
G, .  The  pruned  quadtree.  ,  thus  represents  the  same 
s )ic!  e  as  Q  .  but  with  a  reduced  resolution. 

A*  terminology:  .4*  is  a  classical  minimum  cost 
graph  search  algorithm,  whose  optimality  properties  an? 
well  known  [NilssonHOj.  In  this  algorithm  OPEN  is  a  list 
consisting  of  all  the  nixies  in  the  search  graph  that  are 
generated  but  not  yet  expanded.  CLOSED  is  the  list  of 
nodes  in  the  graph  that  have  been  expanded.  Best  node  Is 
the  node  that  is  currently  being  expanded  in  the  search. 
This  node  has  the  l>est  evaluation  (i.e.,  minimal  path 
c>>st|  among  the  nixies  on  OPEN.  The  predecessor  of  a 
node  V  in  the  search  graph  is  the  nixie  preceiling  .V  on 
the  current  best  path  to  .V  Ifrom  the  start  nixie). 

2.  quadtree  based  path  planning 

2.1.  Representation  preprocessing 

We  have  d-vi-lopi-d  an  algorithm  for  mobile  ml  ml 
path  planning  based  on  a  quadtree  representation  of  •!:<» 
robot  s  immediate  environment.  If  there  arc  large  areas 
,  f  free  spare  (or  of  obstacles |  then  those  areas  can  be 

rrpr . ntrd  bv  a  few  large  blocks  in  the  quailt ree  and  ran 

i  e  deal;  wills  t>  units  by  t In*  planning  algorithm. 

(iit.n  a  binary  array  or  raster  representation  of  a 
robot's  immediate  environment  we  first  grow  the  obsta¬ 


cles  by  the  radius  of  robot  3  cross-section  [MoravecSl]  and 
then  convert  the  raater  into  a  quadtree  represer  t-tion 
using  a  raster  to  quadtree  conversion  algorithm 
[Samet81j.  This  algorithm  is  of  complexity  O  (n  )  where 
n  is  the  number  of  pixels  in  the  image  being  converted. 
In  the  resulting  quadtree  blocks  of  0’s  represent  free  space 
nodes  and  blocks  of  l’s  represent  obstacle  nodes. 

in  the  second  stage  of  preprocessing,  we  compute  the 
distance  transform  of  the  set  of  C’s,  i.e.,  the  free  space 
blocks.  Thu  determines,  for  each  block  of  free  space,  the 
minimal  distance  between  the  center  of  that  block  and 
the  boundary  of  a  block  of  obstacles.  Samet  [3ametS2n] 
describes  an  algorithm  for  computing  this  distance 
transform  for  quadtrees  which  is  of  complexity  0(r. ), 
where  n  is  now  the  number  of  leaf  nodes  in  the  quadtree. 

2.2.  Path  planning  algorithm 

Given  the  start  and  goal  points,  we  first  determine 
the  quadtree  leaf  nodes,  S  and  G,  representing  the 
regions  of  th*  image  containing  these  points.  Next,  we 
plan  a  minimum  cost  path  between  5  and  G  in  the 
graph  formed  bv  the  non-obstacle  leaf  nodes  of  the  quad¬ 
tree,  using  the  well  known  A*  search  algorithm  with  the 
evaluation  function  /  of  a  node  c  defined  as 

/(*)”»(«)+*(«) 

Here  g  (c  )  represents  the  cost  of  the  path  from  S  to  c 
and  h(c  )  represents  the  heuristic  estimate  -f  the  cost  of 
the  remaining  path  from  r  to  C. 

Since  the  cast  of  a  path  should  depend  both  cn  the 
actual  distance  travelled  along  the  path  and  the  clearance 
of  the  path  from  the  obstacles,  we  define  y  (c  )  as 

»(*)-■»(?)  +  »(?.«) 

where  g  (p  )  is  the  cost  of  the  path  from  5  to  e 's  prede¬ 
cessor,  p  ,  on  the  path  and  g  (p ,  e  )  is  the  cost  of  the  path 
segment  between  p  and  c  .  The  latter  function  in  turn  is 
defined  as 

q(p,  r  )  =  D(p,  e  ))  +  a  d{c  ) 

with  D(p,e  )  re  preventing  the  actual  distance  between 
nodes  p  and  c  .  given  as  half  the  sum  of  the  node  sizes, 
and  d(c  )  representing  the  cost  incurred  by  including 
node  c  on  the  path.  d(c  )  depends  upon  the  clearance  of 
tin  nixie  c  from  the  nearby  obstacles.  We  rhuee  a  linear 
shape  for  the  cost  function  d .  defining  d  [c  )  as 

I  •■=  °m.,  O  I*-  ) 

where  a|r  )  is  the  distance  of  the  node  c  from  tin- 
nearest  obstacle  given  by  the  quad  tree  distance 
transform,  and  om4I  is  the  maximum  such  distance  for 
any  node  in  the  quadtree  Iso  that  d  ( r  )  is  always  |xisi- 
tive).  it  In  the  equation  for  q  I p  ,  c  )  is  a  positive  con¬ 
stant  which  ileti  rmin-s  by  how  far  An*  rrsultant  path  will 
avoid  ofisl  aelcs. 


Finally,  A  (c  )  is  calculated  as  the  Euclidean  distance 
between  the  midpoints  of  the  regions  represented  by  c 
and  G  .  Clearly,  this  measure  is  a  lower  bound  on  the 
actual  minimum  cost  path  between  '  and  G  ;  thus  an  A? 
search  with  this  measure  as  its  heuristic  estimate  is 
admissible.  The  power  of  this  heuristic  depends  upon  the 
average  deviation  of  the  min:mum  cost  path  from  the 
straight  line  path.  It  is  highest  for  the  case  where  a  is 
zero  and  decreases  as  a  increases.  It  is  of  course  possible 
to  iise  more  informed,  but  inadmissible,  heuristics  to 
speed  up  this  search. 

The  node  expansion  process  involves  finding  the 
non-obstacle  leaf  nodes  adjacent  to  the  the  node  being 
expander.  We  accomplish  this  by  using  a  neighbor 
finding  strategy  similar  to  that  given  by  Samet 
(Samet82bj  with  two  differences.  First,  only  the  neigh¬ 
bors  in  the  horizontal  and  vertical  directions  are 
considered — diagonal  neighbors,  which  share  only  single 
points  with  the  current  node,  would  result  in  inflexible 
paths  whicu  clip  obstacle  corners.  Secondly,  when  one  of 
the  neighbors  given  by  the  quadtree  neighbor  finding 
algorithm  is  a  gray  node,  we  find  the  non-obstacle  leaf 
nodes,  if  any,  of  the  quadtree  rooted  at  that  gray  node 
that  are  adjacent  to  the  node  being  expanded  and  con¬ 
sider  them  as  neighbors. 

The  resu't  of  applying  the  above  A*  algorithm  to  the 
quadtree  is  a  list  of  nodes  from  the  quadtree  (ordinarily 
<>f  varying  sizes)  which  define  a  set  of  paths  between  the 
start  and  goal  nodes.  If  desired,  an  optimal  path  through 
these  blocks  can  be  computed,  or  the  center  points  of 
consecutive  blocks  on  the  list  can  be  connected  to  com¬ 
pute  a  negotiable  path. 

2.3.  Results 

Figure  2.2  contains  a  simple  example  of  a  path 
obtained  using  this  algorithm.  Figure  2.2(a)  is  a  binary 
array  with  start  and  goal  points  marked,  alone  with  an 
indication  of  the  path  determined  by  the  algorithm.  Fig¬ 
ure  2.2(b)  contains  the  tree  data  structure  that  represents 
the  quadtree,  in  which  the  blocks  on  the  computed  path 
an-  marked  with  p's.  't  is  important  to  note  the  reduc¬ 
tion  in  the  number  of  nodes  achieved  by  the  algorithm. 
Figure  >.!t(a|  shows  a  path  planned  on  a  more  compli¬ 
cated  image  man  with  the  constant  n  set  to  1  and  Figure 
2.3(h)  -hows  the  same  example  with  n  set  to  zero.  Notice 
that  the  time  taken  in  the  former  case  is  considerably 
higher  titan  in  the  latter.  This  should  be  expected,  since 
:ls  noted  in  the  last  section,  the  heuristic  power  of  A 
reduces  as  n  increases. 

It  is  tho  interes'ine  to  note  that  although  it  is  true 
licit  the  quadtree  representation  is  sensitive  to  displace¬ 
ment-  .  ,f  obstacles  with  respect  to  the  grid  boundaries, 
the  viviugs  in  -pare  and  eoniputation  afforded  by  t his 
method  are  -tiil  very  high  on  the  average.  Further, 
s-inict  et  al.  Samets;1  point  out  that  for  complicated 
images  tbe  posit'e  tiing  of  the  image  origin  is  likely  to 
I:  \  e  little  effect  on  I  lie  number  of  nodes  in  the  resultant 
quad!  ree. 


2.4.  Advantages  of  the  quadtree  approach 

Compared  to  the  first  category  of  path  planning 
algorithms  mentioned  in  the  introduction,  such  ns  the 
grid  search  method,  Che  path  planning  cost  for  quadtree 
based  search  will  be  substantially  lower  because  tne 
number  of  nodes  to  be  searched  in  the  quadtree  approach 
is  considerably  smaller.  In  fact,  the  number  of  leaf  nodes 
in  a  quadtree  of  an  image  map  having  polygonal  obstacles 
is  approximately  jSamet83]  —  O(p).  where  p  is  the  sum 

of  the  perimeters  of  the  (polygonal)  obstacles  in  terms  of 
the  lowest  resolution  units,  in  our  case  pixels  (or  grid 
points).  Thus  A*  search  will  only  have  to  deal  with  about 
0  ( p  )  nodes  in  the  case  of  a  Quadtree,  instead  of  the  n  2 
grid  points  in  the  case  of  a  grid  search,  a  substantial 
reduction.  Similarly  the  “local-bound”  behavior  of  the 
first  category  algorithms  is  absent  in  this  approach, 
because  the  nodes  are  on  the  average  much  iarger  than 
single  pixels  and  ii  is  straightforward  to  determine  the 
“nearness"  of  the  nodes  to  the  obstacles.  A  hierarchy  of 
different  levels  of  description  cf  the  space  that  is  available 
with  quadtrees  enables  us  to  stanch  for  a  path  close  to 
obstacles  only  when  nee  ssary.  Corner-clipping,  inflexible 
paths  are  eliminated  by  consid-ring  only  neighbors  in  the 
horizontal  and  vertical  directions. 

Unlike  the  second  category  of  methods  that  involve  a 
cosily  change  of  representation,  the  proposed  approach 
has  a  very  small  representation  overhead.  As  pointed  out 
in  Section  2.1,  both  the  representation  algorithms 
involved  are  of  complexity  0(n  ),  whereas  many  methods 
of  the  second  category  have  a  representation  cost  which  is 
far  higher. 

Thus  quadtree  based  path  planning  'is  a  good 
compromise  between  free  space  algorithms  and  grid 
search  type  algorithms.  In  addition,  the  path  produced 
by  the  quadtree  algorithm,  although  not  “optimal",  is  a 
“negotiable"  path  which  can’  he  computed  relatively 
quickly.  Apart  from  this,  the  hierarchical  nature  of  the 
representation  gives  many  advantages  in  path  planning. 
For  example, 

[al  We  can  easily  constrain  the  path  to  satisfy  certain 
ror, tilth.  s"ch  as  specification  of  minimal  clearance 
of  the  pal.. 

[b]  More  importantly,  we  can  make  the  search  staged, 
i.e.,  plan  a  patn  at  a  coarser  lever  and  subsequently 
refine  it  as  needed,  thus  reducing  the  planning  cost 
substantially. 

The  former  advantage  has  been  discussed  in  Section  2.2. 
We  will  discuss  the  latter  at  greater  length  in  the  next 
section. 

i.  STAGED  PATH  PLANNING 
3.1.  Motivation 

Though  the  aVorith:.i  which  we  presented  in  the 
previous  section  is  relatively  efficient  it  can  he  improved 
upon  substantially.  We  often  get  undesirably  small 
“black"  (obstacle)  nodes  in  the  quadtree  representation. 
One  obvious  source  for  this  may  he  the  existence  of  very 


12  3 


4* 


h> 


! 

? 


k' 

i 


small  obstacles  in  a  region  of  the  environment  that  is  oth¬ 
erwise  obstacle  free.  A  more  important  source  of  these 
black  nodes  is  the  representation  of  irregular  obstacles  in 
quadtrees.  Due  to  the  recursive  nature  of  the  quadtree, 
these  small  black  nodes  will  fragment  the  free  space,  giv¬ 
ing  rise  to  an  undesirable  increase  in  the  depth  of  the 
quadtree  and  the  number  of  leaf  nodes,  consequently 
increasing  the  cost  of  the  search. 

We  can  deal  with  this  problem  by  first  planning  the 
path  in  a  reduced  resolution  quadtree,  called  a  pruned 
quadtree,  that  contains  gray  leaf  nodes,  corresponding  to 
mixtures  of  obstacles  and  free  space.  This  implies  that  a 
node  can  now  have  gray  neighbors.  An  algorithm  which 
is  capable  of  planning  a  global  path  at  this  coarser  level, 
and  subsequently  developing  the  path  inside  the  gray- 
nodes  (which  are  included  in  the  global  path)  in  the 
second  stage,  can  give  rise  to  savings  in  terms  of  compu¬ 
tation,  without  significant  degradation  of  the  path 
obtained.  As  mentioned  in  Section  2.4,  the  number  of 
leaf  nodes  is  on  the  order  of  the  sum  of  the  perimeters  of 
the  obstacles,  measured  in  the  lowest  resolution  units. 
Thus  conducting  search  at  a  resolution  l  levels  below  the 
pixel  resolution  reduces  the  “sum  of  the  perimeters"  and 
“number  of  leal  nodes"  by  a  factor  of  2^  thereby  sub¬ 
stantially  reducing  the  time  complexity  of  the  search. 

There  are  two  aspects  to  this  staged  search  that 
deserve  detailed  attention — the  treatment  of  gray  leaf 
nodes  during  planning  and  the  generation  of  pruned 
quadtree  from  the  original  quadtree.  In  the  next  two  sub¬ 
sections  we  shall  discuss  these  two  aspects  in  detail. 

3.2.  Dealing  with  gray  leaf  nsdaa 

When  planning  a  path  through  the  pruned  quadtree, 
we  have  to  deal  with  gray  leaf  nodes.  Specifically,  'he 
following  three  questions  must  oe  answered: 

(1)  What  "i3  done  when  one  of  the  neighbors  of  the 
current  bert  node  (the  node  that  is  currently  being 
expanded  in  the  .4*  search)  is  a  gray  node? 

(2)  How  is  the  current  path  expanded  when  the  current 
best  node  is  a  gray  node? 

(?|  How  is  the  first  stage  path,  involving  gray  leaf  nodes, 
processed  to  get  the  final  path  that  contains  free 
nodes  exclusively? 

We  shall  address  these  in  the  following  sub  actions. 

3.2.1.  Gray  leaf  neighbora 

If  one  of  the  neighbors,  ,V,  of  the  current  best  node, 
[]  ,  is  a  gray  node  then  before  putting  A’  on  the  OPEN 
list,  we  must  ensure  that  .V  can  be  entered  from  B.  If 
II  is  a  free  node  then  ;V  can  be  entered  iff  there  exists  at 
least  one  free  node,  m  ,  in  .S(Ar)  such  that  m  is  adja.-ent 
to  B .  If,  in  addition,  B  itself  is  a  gray  node  then  .V  can 
be  entered  from  B  as  long  as  there  exists  a  free  node,  e  , 
in  S(B)  such  that  e  is  adjacent  to  m  .  Note  that  check¬ 
ing  this  entry  condition  alone  does  not  guarantee  that  the 
gray  node  A'  is  passable,  i.e.,  that  a  path  from  B 
through  .V  to  a  third  node,  C,  exists.  For  example  in 
Figure  d.l.  ,V  can  be  entered  from  B,  through  the  free 
node  m  ,  but  ,V  cannot  be  exited,  except  back  to  B . 


If  we  decide  to  put  N  on  the  OPEN  list  then  we 
shall  include  in  the  heuristic  value  of  S  a  measure  of  the 
“path  complexity”,  c( .Y),  inside  jV.  (This  measure 
should  be  zero  ler  a  free  node  since  the  path  inside  the 
free  node  can  be  a  straight  l:ne.)  In  general,  it  is  difficult 
to  give  a  measure  which  truly  represents  the  complexity 
of  a  path  inside  the  grey  node,  since  at  this  point  in  he 
search  the  direction  in  which  the  path  will  be  exiting  the 
gray  node  is  unxnown.  But  in  practice  any  measure 
depending  upon  the  gray  content  (number  of  obstacle 
pixels  ins;de  the  gray  node)  of  the  gray  node  will  be  a 
good  choice.  One  such  normalized  complexity  measure 
for  the  gray  node  ,Y  is 


c(.V) 


gray  content  (N) 
sue  (,V ) 


Given  two  gray  nodes  having  the  same  gray  content,  the 
path  complexity  should  intuitively  be  higher  for  the  gray 
node  representing  a  region  with  more  obstacle  nodes. 
Thus,  a  better,  although  costlier,  complexity  measure  of 
the  gray  node  ,Y  will  take  into  aciount  the  number  of 
obstacle  nodes  in  S(S). 

Once  the  heuristic  value  is  calculated,  th*  gray  node 
is  placed  on  the  OPEN  list  and  it  can  be  selected  for 
expansion  whenever  its  /-value  is  the  bes’  among  the 
nodes  on  the  OPEN  list. 

« .3.3.  Expanding  gray  nodaa  during  aaareh 

When  the  current  best  node,  B ,  happens  to  be  a 
gray  node,  expanding  B  becomes  a  more  involved  operar 
lion.  After  generating  B's  neighbors  we  must  ensure 
that  for  each  of  these  neighbors,  Af,  there  exists  a  path0 
through  B  that  connects  B  s  predecessor,  P,  on  the 
current  path  to  Af  (see  Figure  3.2).  We  refer  to  this  as 
t he  “reachability"  analysis  for  neighbor  Af.  Secondly,  for 
each  neighbor  A?  that  can  thus  be  reached  we  have  to 
record  as  N's  g -value  an  estimate  of  the  shortest  path  to 
N  through  B .  This  estimate  should  take  into  account 
the  fact  'hut  the  shortest  path  through  B  may  not  be  a 
straight  line  path  since  B  is  a  gray  node. 

One  way  to  achieve  the  above  two  objectives  is  by 
performing  an  A*  search  rooted  at  B  to  determine  if  N 
can  be  reached  from  P.  If  the  A*  sea cok  finds  such  a 
path  to  ;V,  then  we  can  use  the  cost  of  that  path  as  the 
®  g  -value  of  neighbor  N.  The  advantage  of  this  method  is 
that  we  have  the  full  power  of  A*  sea-ch.  The  principal 
disadvantage  to  this  method  is  that  we  need  to  perform 
this  A*  search  once  for  every  neighbor  of  B  ,  a  rather 
large  price  to  pay  for  patn  optimality. 

To  avoid  the  above  disadvantages  associated  with  A* 
search  we  ele-tcd  to  compute  a  distance  transform  of  the 
gray  node  an  a  way  of  dealing  with  the  problems  be-  ' 
gray  node  expansion. 

Let  /  be  a  free  node  in  S(g/)  such  that  /  is  adjacent 
to  B's  predecessor,  P.  Notice  that  there  can  be  more 
than  one  such  free  node  in  S[B).  If  P  is  a  g.  ay  node, 
then  we  require  that  f  be  adjacent  to  a  free  node  in  S  ‘ P  ) 
(called  an  “exit  node”  for  P  ).  This  exit  node  would  have 


*/* 


/.  • 


f  ~ 


*\V 


424 


Vn 


rcrrr 


beer,  ueterrained  when  P  was  being  expanded.  \V«.  illus¬ 
trate  ail  this  in  Figure  3.2.  P  is  the  predecessor  of  the 
best  node,  B ,  and  .V  is  a  neighbor  of  B .  Both  /  and  /' 
are  free  nodes  in  5  (.7 ).  They  are  also  adjacent  to  P .  In 
such  a  situation,  we  chose  the  free  node  which  has  the 
least  straight  line  distance  to  the  goal  node  —  in  this  case 
/.  Thus  the  current  path  enters  B  through  /  •  /  is 

recorded  as  the  entry  node  of  B 

Next,  we  compute  a  distance  transform  of  the  region 
represented  by  B.  with  respect  to  /.  This  involves 
recording  for  each  free  node,  /',  in  5(0),  f  s  shortest 
distance  (which  we  refer  to  as  dis (/,/'))  from  /.  To 
carry  out  this  computation,  we  first  initialize  dis(/.  /)  to 
zero  (see  Figure  3.2),  and  d;s(/,/')  foe  all  other  free 
nodes,  /',  ia  5(0)  to  oo.  Next,  ve  carry  out  the  propa¬ 
gation  step:  we  find  all  the  neighbors  of /,/',  which  are 
in  5(0)  and  for  each  such  neighbor.  /',  calculate 
dis (/,/'),  as  the  sum  of  dis(/./)  and  the  nodal  distance 
between  /  and  /',  D  (/.!')■  To  ensure  that  the  path 
inside  0  will  take  clearance  from  the  obstacles  into  con¬ 
sideration,  we  include  the  cost  of  the  node  d(f')  (see  Sec¬ 
tion  2.2)  in  dis(/,/').  We  repeat  this  propagation  step  lor 
all  th»  neighL-ors  of  /,  with  the  neighbors  taking  the  role 
of  /.  and  so  on.  until  we  exhaust  ail  the  free  nodes  in 
5(0).  The  detailed  procedure  is  given  in  an  algorithmic 
fashion  in  Table  3.1.  and  is.  essentially,  the  familiar 
shortest  path  algorithm  for  the  case  of  "single  source 
multiple  destinations"  (see,  e.g..  [Horo82]). 

Having  computed  the  distance  transform  of  B  with 
respect  to  /  ,  as  detailed  above,  we  are  now  rrady  to  con¬ 
tinue  with  the  expansion  of  0.  ror  each  of  0  s  neigh¬ 
bors,  ,V,  ,V  is  marked  reachable,  if  there  exists  a  free 
node,  r  .  in  5(0)  which  satisfies  the  following  two  condi¬ 
tions  (see  Figure  3.2.): 

(1)  <iis(/,  e  )<oc.  This  ensures  that  there  is  a  path 
between  e  and  /  inside  5(0  ). 

(2)  V  can  be  entered  from  t  .  As  discussed  in  Section 
3.2.1.  if  .V  is  a  free  node,  this  condition  is  satisfied 
as  long  as  V  and  e  a"e  adjacent.  If,  on  the  oilier 
hand,  ,V  is  a  gray  node  then  the  condition  is 
satisfied  if  there  exists  a  free  node  rn  ill  5(.V)  such 
that  m  and  t  are  adjacent. 

The  node,  e  .  satisfying  the  above  two  conditions  is 
marked  as  the  exit  node  of  gray  node  0  with  respect  to 
V .  Notice  again  that  there  may  be  more  than  one  such 
node.  For  example,  in  Figure  3.2.  e  and  e '  satisfy  both 
conditions,  since  there  is  a  path  from  /  to  each  of  these 
nodes,  and  .V  can  be  entered  from  Isith  the  nodes.  In 
Midi  a  situation,  we  select  the  node  with  smaller  distance 
to  /  as  the  exit  node.  Tints,  in  Figure  3.2.  r  would  he 
chosen  as  the  exit  node  of  0  with  respect  to  .V. 

Neighbor  .V  '  ,  he  best  node  B.  is  placed  oil  the 
Ol’IlN  li  t  only  if  there  exists  all  exit  node,  r  .  for  II  with 
respect  to  A  .  If  .V  does  go  on  to  i  lie  Ol’KN  list,  the 
sum  of  the  <i -value  of  0  N  predecessor  /’.  17  ( / '  ).  and 
dls|/,  r  |  is  recorded  as  g !  V  ).  If  .V  is  a  gray  node,  we 
have  to  include  in  As  hturistir  value,  hi  A  I,  an  estimate 
of  the  path  complexity  inside  A  -is  discussed  in  5-rtion 


3.2.1.  This  comple.es  the  discussion  of  the  expansion  of 
the  uest  gray  node  B . 

At  this  point  it  is  worth  noting  the  advantages  of 
us:ng  the  distance  transform  in  dealing  with  gray  leaf 
nodes:  First,  it  eliminates  the  necessity  of  multiple 

rooted  ,-t*  searches.  The  distance  transform  computation 
is  efficient  on  the  quadtree  representation.  Second, 
developing  the  path  inside  the  gray  nodes,  after  the  first 
stage,  is  very  simple  (see  below) 

id-3-  D*x*iep!ng  the  first  stage  path  containing  gray  rvxiee 

At  the  end  of  the  first  stage  of  the  staged  search  the 
planned  path  may  contain  gray  nodes  as  well  «  free 
nodes.  The  path  inside  the  gray  nodes  is  developed  in 
the  second  stage. 

If  rooted  A*  search  were  used  in  expanding  gray 
nodes  (as  discussed  in  the  previous  subsectiou),  then  this 
second  stare  would  simply  amount  to  concatenating 
these  paths  through  gray  nodes  with  the  free  nodes. 

If  the  distance  transform  is  used  instead  of  rooted  .-t* 
search,  then  the  path  development  inside  gray  nodes  is 
not  as  simple.  The  path  developmefit  computation 
involves  the  following  (refer  again  to  Figure  3.2): 

For  each  gray  node  0  on  the  path  we  retrieve  0's 
entry  node  /  (recorded  while  expanding  0)  and  0's 
predecessor  P  and  successor  .V  on  the  path.  Next,  using 
iV ,  we  retrieve  the  exit  node,  e,  for  0  corresponding  to 
V .  Now  devloping  the  path  inside  0  amounts  to 
finding  the  shortest  path  between  e  and  /  and  inserting 
it  in  between  .V  and/*.  Finding  the  short.^i  path 
between  e  aud  /  simply  involves  backing  up  to  / 
through  neighbors  having  smallest  distance  transform 
values,  lu  Figure  3.2,  for  example,  the  shortest  path 
between  t  and  /,  as  found  by  this  method,  is  e  -  d ,  — 

•  d, 

JJ.  Prused  quad  Lee*  generation  method* 

The  primary  motivation  for  pruned  quadtree  based 
Staged  search,  as  noted  in  Section  3.1.  is  to  offset  the 
disadvantages  of  the  fixed  grid  uniform  recursive 
decomposition  in  volt  -d  in  quadtree  representation.  By 
choosing  an  appropriate  pruned  quadtree,  we  can  avoid  a 
profusion  of  nodes  in  a  region  of  the  image  map  which  is 
relatively  obstacle  free.  This  poses  the  question  of  how  .0 
decide  when  a  region,  or  the  gray  node  representing  it,  is 
relatively  obstacle  free.  None  of  the  simple  measures 
|surh  as  grayness  of  ’lie  node)  alone  can  answer  this  ques¬ 
tion  entirely  -at  -j a  iorilv.  For  example,  the  grayness  of 
a  node  tells  us  nothing  about  the  distribution  of  the  obs¬ 
tacles  in  the  region  represented  bv  that  node,  and  in  the 
extreme  case  a  small  value  of  crayriesN  may  actually  be 
the  result  of  a  streak  of  obsi  ac|e  pixels  through  t  h«-  mid¬ 
dle  of  the  Mop'-  rotnmonly.  a  mm  nil  tf'iyiif's**  valn»* 

of  a  erriy  mo«J<*  may  ii<*  <Inr  to  a  -.raftm-i  ohsiarlr 
li  ’tion  ihMtjf*  i»ray  t»o«|p,  whirli  frnj;iii«*nts  th#*  fn*r* 
lf>  surli  a  t  |jf»  ^ r;*V  nn<|r  is  nli\)'»i|s|y  a  |ia<l 

r:unji'l:it<*  for  a  N  al  fnni«*  in  ( In*  priim«i  tpiri'lt  \i  f  h<- 

v*nnw  tim#\  Wf  <io  not  want  to  l*;ts<*  f>nr  'l<*«'ision  t a  v»tv 
invo|  \  **.J  analysis  of  i  lie  ^ray  ruxi**,  l>*-'*aiiv»*  this  may 


v 


\ 


+  0 


* 


increase  the  cost  of  pruned  quadtree  generation  to  the 
point  where  the  staged  search  ?s,  overall,  less  efficient 
tii  in  searching  ih=  original  quadtree. 

The  tneth.  l  proposed  uses  a  threshold  on  L  (O  ),  the 
number  of  leaf  nodes  in  S(G)  to  identify  leaf  nodes  of 
the  pruned  quadtree.  Any  gray  node,  G  ,  whose  HG  )  is 
lower  than  the  threshold  is  made  a  leaf  node  of  the 
pruned  quadtree  in  a  breadth  first  traversal  of  the  quad¬ 
tree.  Computation  of  L(G)  a  straightforward.  Fo-  a 
given  threshold,  there  is  am  upper  bound  on  the  cost  of 
gray  node  evaluation  based  on  the  distance  transform, 
and  thus  the  cos*  of  the  staged  search  can  be  effectively 
controlled. 

One  important  advantage  of  ibis  method  is  that  the 
threshold  on  L{G)  is  relatively  independent  of  the 
specific  image,  and  depends  only  on  global  criteria  such 
as  maximum  allowable  gray  node  evaluation  cost  and 
maximum  allowable  subcptimalily  of  the  resultant  path. 
Figure  3.3(b)  shows  a  pruned  quadtree  generated  using 
this  method  from  the  quadtree  in  Figure  3.3(a)  and  also 
gives  the  result  of  a  staged  searrh  on  this  pruned  quad¬ 
tree. 

1.4.  RarnHa  of  the  sts«ad  mrck 

Figures  3.4(I-IV)  depict  the  paths  found  by  pure  .4* 
search  on  the  original  quadtree,  the  first  stage  of  staged 
search  on  the  pruned  quadtree  (with  gray  nodes  in  the 
path),  and  the  second  stage  of  staged  search  (after  paths 
inside  the  gray  nodes  are  developed).  The  pruned  quad¬ 
tree  used  in  the  staged  search  is  generated  automatically, 
as  diseuwd  in  the  section  on  pruned  quadtree  generation. 
Each  of  the  figures  lists  the  epu  time  taken  for  path  plan¬ 
ning.  number  of  nodes  considered  by  the  search  versus 
total  number  of  le*f  nodes,  and  details  of  the  method  of 
pruned  quadtree  generation  used. 

The  path  generated  by  the  staged  search  is  compar¬ 
able  to  the  optimal  pa’h  generated  by  the  pure  .4*  searrh. 
However,  the  total  epu  time  taken  (with  compiled  Franz 
Lisp  running  on  a  VAXI1/7H5)  by  staged  search  (for 
pruned  quadtree  generation.  .4*  search  and  second  stage 
path  development)  is  3  to  10  times  less  than  that  taker, 
by  the  pure  A*  search.  (See  Figures  3.4(1-FV)  for  detailed 
timings  for  ihrJ  examples  presented.  The  timings  are  in 
epu  seconds  and  involve  substantial  paqe  swapping  over¬ 
head.)  Our  experiments  show  that  the  computational 
savings  art  much  higher  for  cluttered  environments  than 
for  relatively  free  environments  compare  Figures  3.1(1) 
and  3. 11 IV).  for  example.  This  is  reasonable  store  the 
fragmentation  of  free  space  is  much  higher  in  '  (uttered 
environments. 

4.  RELATED  WORK 

As  pointed  out  in  Sertion  1.  hierarrhie.il  representa¬ 
tions  have  been  used  previously  in  manipulator  tindpath 
tasks.  In  this  sertion  we  discuss  some  of  that  previous 
work  in  relation  to  our  ow  n. 

Wong  el  al.  (Woiir’Lj]  use  a  modified  version  of 
quadtrees  to  solve  3-1)  tindpath  probl'-ms  by  planning  a 


path  in  the  three  orthogor  al  2-D  projections  of  the  3-D 
environment.  Their  approach  essentially  searches  for  a 
path  in  a  ‘  point  based"  quadtree  representation.  (See 
[Samet83j  for  a  comparison  between  "region  based"  and 
"point  based"  quadtrees.)  Faverjon  [Faverjon84j  uses 
octree*  (an  extension  o i  quadtrees  to  3-D1  for  reducing 
the  time  complexity  of  -he  3-D  findpath  problem  for  a  six 
joint  manipulator. 

Lozano-Perei  [Lozaiio81 1  represented  free  space  in 
the  "configuration  space"  as  a  hybrid  hierarchical  sliuc- 
ture  consisting  of  rrctanguioid  and  polyhedral  cells.  He, 
however,  planned  a  cell  path  strictly  among  the  free  cells  , 
of  the  representation,  thus  missing  the  computational 
advantages  of  hierarchical  staged  search.  Another  prob¬ 
lem  with  bis  approach  was  that  the  path  search  could  fail 
because  the  resolution  of  the  representation  vas  not  fine 
enough.  Brooks  and  Loiano-Perex  later  remeaied  these 
problems  in  [Flrooks83b|.  The  approach  presented  in 
their  paper  .x>mes  cl  iest  to  our  “staged  search" 
approach.  They  cut  the  free  space  hierarchically  into  full 
(obstacle),  empty  (free),  and  mixed  rectanguloid  cells, 
with  the  mixed  cells  representing  areas  of  unexplored 
configuration  space.  They  first  try  to  plan  a  path 
exclusively  through  the  free  cells.  If  that  fails,  they  then 
repeat  the  search,  this  time  considering  the  mixed  cdls 
also.  Next,  for  each  mixed  cell  in  the  cell  path,  they  try 
to  develop  a  path  through  the  mixed  cell.  If  any  of  the 
mixed  cells  turns  out  to  be  impassable,  then  they  may 
have  to  repeat  the  search  again,  finding  another  free- 
mixed  ceil  path.  Since  they  use  the  AP  search  algorithm 
as  the  main  engine  for  all  these  different  searches,  the 
overall  process  turm  out  to  be  very  expensive  computa¬ 
tionally.  Both  |Lozano81|  and  |Brooks&3|  refine  their  cell 
paths  into  point  paths,  since  the  cell  path  in 
configuration  space  represents  a  set  of  possible  solutions 
to  the  findpath  problem. 

t.  CONCLUSIONS 

In  this  report  we  have  presented  methods  of  short 
rarge  path  planning  for  mobile  robots,  using  quadtree 
hierarchical  data  structures.  We  demonstrated  the  merits 
of  quadtree  bastd  path  planning  and  also  discussed  in 
detail  a  method  of  staged  path  planning,  with  improved 
computational  cost  compared  to  pu*e  quadtree  based  sin¬ 
gle  stage  path  planning. 

l-ozamd’rrrz  jLozanoHl]  observes  that  the  most 
important  heuristic  for  a  path  planning  space  reprrsentv 
tioll  is  to  avoid  excess  detail  (and  therefore  time  spent) 
on  parts  of  thr  space  which  do  not  alfcct  the  planning 
operation.  The  quadtree  representation  naturally  pro¬ 
vides  mi, -h  a  description  of  free  space,  “short  rangr  plan¬ 
ning  for  a  mobile  robot  should  be  based  on  decomposition 
of  free  space  into  units  larger  than  pixels  for  the  search  to 
be  global.  Hierarchical  decompositions  like  the  quadtree 
are  c  good  way  to  achieve  this,  especially  since  the 
representation  rosl  involved  is  small.  They  obviously  are 
not  as  optimal  as  decomposing  free  scare  into  channels  or 
more  natural  shapes,  but  the  latter  methods  have  a 
higher  represent  allon  cost.  Some  of  the  suboptimality  of 


420 


o 


uniform  grid  recursive  decomposition  involved  in  quad¬ 
tree  representation  is  offset  by  the  staged  version  of  the 
path  planner.  Another  important  use  of  staged  search  in 
dynamic  path  planning  is  that  it  offers  an  elegant  way  of 
treating  uncharted  areas.  These  can  be  represented  as 
gray  nodes  with  very  high  eo  .,  and  when  they  get 
included  in  the  search,  further  processing  can  be 
expended  *o  chart”  those  regions. 

Ixx'king  further,  the  mobile  robot  needs  to  continu¬ 
ally  update  the  planned  path,  as  it  traverses  it,  in  the 
light  of  new  information.  To  do  this  efficiently,  we  need 
to  be  able  to  "add  to”  and  “delete  from”  (or  update)  the 
representation  of  the  image  map  with  relatively  low  cost. 
In  the  context  of  dynamic  path  updating,  one  desir  .ble 
property  of  a  free  space  representation  is  that  the  indivi¬ 
dual  obstacles  affect  the  representation  only  in  their 
immediate  locality.  A  disadvantage  of  quadtree  represen¬ 
tation  of  free  space  is  that  it  does  not  localize  the  effect 
of  obstacles  on  the  representation.  This  is  a  general 
shortcoming  of  representations  which  cut  free  space  into 
rerianguloid  cells.  In  contrast,  the  generalized  cone 
representation  of  free  space  described  in  [BrooltsH3a| 
sat  Mies  this  property.  Presently  we  are  concentrating  on 
efficient  methods  of  path  updating  for  the  quadtree  based 
I  tanning  methods  discussed  in  this  report. 


REFERENCES 


Hn¥,kct|a 

« looks  R  A  .  "Solving  the  findpath  problem  by  food  repre*er>- 
i.itn.T,  *>f  free  spa/e  IEEE  Transacting*  oa  Systems.  V««, 
and  ykerue  lies  1J,  1 0H3 .  1 1 97 

1  If.  m.  k  o*.H. 

llronks  R  \  and  l.osaro-IVres.  T  ,  "A  subdivision  algorithm 
.n  configuration  spa  *  for  hndpath  with  rotate. ’  m  Proceed- 
Eighth  International  Joint  i'onference  on  Artificial  l  nit  lit 
gmre  Karl-ruhe.  W  Certnany  1983 

O  •  i-nS 

Davis  I.S.  Andrevn  K.  Kaufman.  K.  K.mbhaxnpati.  S  . 

\  i-uaI  algorithm*  for  autonomous  navigation."  in  Proceed- 
tng$  It.l.t  International  Conference  on  Hokities  anf  Antoma- 
tton.  St  Loins,  MO.  March  19K5 

Kaverp.n*  I 

Kv**Tjon.  II  “Obstacle  avoidance  using  an  or  tree  ,n  the 
configuration  >pace  of  a  manipulator. "  a  Proe reding*.  ll.l.E 
International  <\ onfremee  on  Hckoties,  Atlanta  (',A  March 
1984 

l!oro»tti.  |,  .and  ‘-.ahnt.  S  .  Fundamentals  of  Pats  Structures 
<  l  -.f  ter  6.  (  omj  '.Jter  Science  IVex..  Rockvnle,  MD.  1982 

I.ii7.in<el’erej,  T  and  \VVaI*v.  M  A  .  "V-I  Algorithm  for  pUn- 
n,n*  collision-free  path*  among  pob  l.edrat  obstacles".  <  'am- 
munieations  of  the  A(  M  22.  5o<VF»70 


Lot*nc8l ; 

LcianoPfrfi  T  ,  Automat  c  planning  of  manipulator  transfer 
movements,  “  IEEE  Trans* ‘tion*  on  Sgstema.  Man.  and  Cyktr- 
netua  11,  1981.  681-698. 

Mor*vec81 

Moravec.  H  ,  “Rover  visual  obstacle  avoidance  ,  in  Proceed- 
t*fj.  Seventh  International  Joint  'cn/fre*Le  on  Artificial  Intelli¬ 
gence.  Vancouver.  BC  ,  Canada,  1981 

Nib*on69 

Ntbaon,  N  J  ,  “A  mobile  automaton  an  appitcaoon  f  artificial 
intelligence  technique*. "  m  Proceedinga.  Ftrat  international 
Joint  C 'onjtrenoe  on  Artificial  Intelligence.  Washington,  DC, 
1969 

Nib*on£0 

Nibaoo,  N  J  .  PmetpU*  of  Artificial  Intelligence.  Chapter  2. 
Tioga,  Palo  .Alto.  CA,  1980 

Saraet81 

Samel.  H  .  “An  algorithm  for  convening  rasters  to  quadtrees.' 
IEEE  Transaction*  on  Pattern  Analysis  and  Machine  Intelli¬ 
gence  3,  1081,  93-95 

Samet£2a 

Samel.  H  .  "D’slan  e  transform  of  image*  represented  by  quad- 
,rw».  IEEE  Transactions  on  Pattern  An  signs  and  Machine 
Intelligent*  4,  *982.  298- 303 

SameU42b 

Samel.  H  .  “Neighbor  finding  technique*  for  image*  represented 
by  quadtree*.  1  Computer  Graphics  and  Image  Proct jatng  IS, 
1982.  37-57 

SamelflJ 

6-a.nrt.  M  .  “The  quadtree  and  related  hierarchical  data  struc¬ 
ture*.  '  t  nivemiy  of  Maryland  Center  for  Automation  Research 
Technical  Iteporl  23.  November  1983 

SametHi 

Garnet  II  et  al.  “ Application  of  hierarchical  data  structures  to 
geographical  information  systems  Phase  HI.  *  I'nivernty  of 
Maryland  Center  fi*  Automation  Research  Technical  Report 
99.  p  59.  Nocerr  her  1084 

Thompson?" 

Thompson.  Alan  M  .  "The  navigation  system  of  the  J|’|, 
rtJxot  ’  in  Proceedings  Fifth  International  Joint  (  onfrrrnet  on 
Artificial  In  •lime nee,  Cambridge.  MA.  1977 

Th<>rpe*4 

Thorpe,  V  .  “Rath  re|a*a*inn  rath  planning  for  a  mobile 
robot,  m  f'rocer  dingo.  National  Conference  on  Artificial 
Intelligence,  Austin.  TX,  |9»4 

.  Wallace  83 ! 

W  allace  R  .  'Two-dimensional  path  planning  and  collision 
avoidance  for  three-dimensional  rol»ot  maitipuhif  <>rs,  ‘  in 
i\e  presentation  and  Processing  of  Spatial  Knowledge ,  I  niversity 
of  Maryland  Department  of  Coinputer  S«  ence  Technical 
Report  1275  Mas  1983 

W  •  >ng8r» 

Won  g .  h  K  and  In  K  S  "  \  Ine-.ucbi  il  orthogonal  space 
approach  to  mlli-ion-frce  path  planning  “  >n  Proceedings, 
II  PE  International  t'onference  »*a  Huhn'iei.  "M  I. t  ins  March 
1  i--5 


BBBDDnaa 

BBDDDDBD 

nnDOBEi 


ajfegfeSI 


•» 


TIME  23 

EXPANDED  191/499 
PURE  A*  ( ALPHA  1) 


PFWF 


TIME  11 
EXPANDED  91/499 
PURE  A*  (ALPHA  0) 


IliMlI  at  MAfW  lUff  /  Mt/<k  Oft  (W  flUm  N> 

at  *A  i iu| • ,  »iik  e  Ml  Uj  I  $  u4  G 
'rpfnoi  mil  m4  go*)  noJo  mkJ  («|M( 

r*|K<*> »1  kwkt  am  1 1*4 


TV*  l»»*f  •itnplt  u  M  M  "ii>  a  Ml  u>  0,  «■  u«*g 
kill  tin  pUftnd  p*iV  Vfd  Otjf  beetlf  »*o*4  ctekMki 


Figure  2.3  Eitatpk  of  amgU  iU(i  ^Laaain 


Procedure  Dt»trana(/j  ,/  ), 


•  H  >«  ihe  gray  node  representing  the  region  in 
*hirh  /  is  a  free  node  The  algorithm  rompiUM  the 
distance  transform  of  li  with  respect  to  /  • 


7w<  H  <t'*  (/•  )— oo; 

1/  .  /)  —  <>, 

di.sfrnrw_0/>A\V  —  bold  list  (/  ). 

until  null  {dxstratxA^O^ES  ) 

do 

/—  firat  ( dtstrans^ OPES  ). 

distrans^()i*F.S  —  teat  ( diAtran&^OFES  ), 

A HltS  —  get_neighbur*_inside_the_nodejy  ,  / J  j. 
foreach  tlbf  £  A  UltS 
do 

<ij.<  (/ .  nbr  )— min{D  (f  ,  f  )  +■  <»•  node_e0M(ri<jr  ) 

+  d\s  (/.  f  ). 
di.%  (/ ,  n6r  )}. 

disirann^Oi'EX  —append  (dis/Mri.i^O/V  V  ,  n6r  ). 
od 
od 

end  Dim rans. 


Table  3  1  Distance  tiansform  algorithm 


Figure  J.l  Dtahng  with  *  gr*y  neighbor 

Th*  inr  V.  *  (.*'*4  '■»>  Gl’F.N  vm>  i&*r*  >«  »  fr*»*  fto4*.  « 

,»  i  k  *•  i  :?,*i  j  ».!,*/ »m  Lj  the  mu  H'*l*  t  >f  ih*  t*H  non*,  fi 

rr»rT*t|»*-ft4  r|  Ui  V  Noi,  t  dn  il  u  !••»  not  (  hiimx  iS*(  \  it  |i4k' 


Figure  3.2  Expanding  a  gray  node 
la  the  figure,  B  rtprwtu  the  current  best  node  in  the  /T  Mtrch  and  .V 
n  on*  at  ill  neighbors  5  and  C  itprtvm  »Urt  and  roai  uode*  rrspec- 
u»*ijr  Tb*  pi Ui  to  B  consists  of  nodes  S  —  I— 2-4— I— S-4—7  —  p  - 
B  id  that  order  Thus  P  a  B '•  predecessor  oa  »he  (urrvai  path  Of  if»« 
,  t^o  nodes  /  and  }  that  aie  M,Kfai  to  P  .  /  u  '•‘Vfr  to  G  ,  *>  /  is  the 
'airy  oodr  at  9  Of  the  four  node*  ».*'.«•,  and  «  "  that  art  adjacent 
to  node  y  S  can  not  be  mured  fro®  «  *  and  *  m  < an  not  be  reached 
fro®  /  Tlua  i  ,  <  '  are  the  poaeibie  candidates  'or  cm  node*  «  ti 
choara  aa  the  can  node  foe  B  < arresponding  u>  ,V  iir.ee  n  <a  nearer  to  / 
than  «'  —  da—/  represent*  the  path  that  will  be 

developed  mcide  B  during  the  second  state  of  rath  development 


i  *  •  ”  *  * 


v* 


ME  53 
EXPANDED  345/737 
PURE  ft* 


LOH 

*13 

mim 

Will 

nib 

au 

r  H 

ESI 

RS 

fr'jl 

P>  W.; '  ■ 
FvIVV  I 

H 

TIME  13 
EXPANDED  24/31 
LEAF-THRESH  50 


S?j<M  p.»ih  jiUnn.tif  witii  l»*al  n<.4r  Mit^hoirJing  t* 
Mi'*  [in  ;iM  <,u.niti.r  r*.,il.r,lt,(,n  xir.tt»tyr 


V.i  li.imjilf  ,if  >mt’!r  nl.igf  |,  inning  i.:>  \  \>\\rr  (jiiaJtree  l"  1,1  1  '*  •  4  *( 

Figure  3.3  Example*  of  pruned  quadtree  grneration  strategies 


in 


*'? 


TIME 
EXPANDED 
PURE  A  + 


283/514 


TIME  12 
EXPANDED  21/22 
LEAF-THRESH  50 

(b) 


Figure  3.4(1)  Staged  va.  single  stage  path  planning  (example  1) 
(a)  shows  the  results  of  single  stage  planning  and  (b)  shows  the 
results  of  staged  planning. 


V* 


TIME  84 

TIME  16 

\ 

EXPANDED 

460/835  EXPANDED  44/49 

?*r 

PURE  A* 

LEAF-THRESH  50 

(a)  (b) 

Figure  3.4(11)  Staged  vs.  single  stage  path  planning  (example  2) 

(a)  shows  the  results  of  single  stage  panning  and  (b)  shows  tin. 

results  of  staged  planning 

P 

Ik- 

.4  3  1 

TIME  38 

EXPANDED  272/688 
PURE  fl* 


TIME  12 
EXPANDED  29/40 
LEAF-THRESH  50 


(») 


(*») 


Figure  3.4(111)  Staged  vs.  single  stage  path  planning  (example  3) 
(a)  shows  the  result*  rf  litgU  stage  plans  in  |  sod  (b)  show*  the 
results  ol  staged  planning. 


TIME  16P 
EXPANDED  542/2863 
PURE  A* 


TIME  18 
EXPANDED  29/124 
LEAF-THRESH  50 


(a) 


(»») 


Figure  3.4(1V)  Sl»ge<i  v».  single  »l*ge  p»lh  planning  (esernple  4) 
(n|  Ihe  fMU.U  of  Jingle  ««('  pinnnin*  ">d  lbl  '*•* 


(smioiLis-  uio 


abstract 

Current  line-by-line  stereo  algcrirhms  make 
numtfons  wrong  matches  We  propose  a  method  based 
on  figural  continuity  along  linear  segments,  which  detects 
and  corrects  a  certain  type  of  those  errors  for  edge  based 
stereo  this  technique  is  extended  to  provide  a  quantita¬ 
tive  measure  of  performance  of  such  stereo  algorithms  A 
few  evaluation  functions  used  for  line-by-li  .»  stereo  are 
compared  using  this  method 


ERROR  DETECTION  AND  CORRECTION 
FOR  STEREO’ 

Rake^h  Mohan 
Intelligent  Systems  Group 
Departments  of  Electrical  Engineering 
and  Computer  Sc  ence 
Powell  Hall  Room  224 
University  of  Southern  California 
Los  Angeles  California  C0Q83-Q2>3 

in  section  2.  we  discuss  feature  based  stereo  in 
section  3.  matching  errors  m  Ime-by-ime  stereo  are  clas¬ 
sified  into  two  categories  A  continuity  constraint  tor  dis¬ 
parity  values  is  derived  from  geometric  principles  in  sec¬ 
tion  4  The  algorithm  to  detect  ■  td  correct  one  of  these 
types  of  errors  is  presented  in  section  5  A  measure  of 
performance  for  Ime-by-lme  stereo  is  -*efined  in  section  6. 
in  section  7  we  compare  the  performance  of  a  few  match 
evaluation  functions  Conclusions  and  further  research 
areas  are  stated  in  section  8 


e 

V 


1  INTRODUCTION 

Ihj  relative  displacement  m  the  position  of  obiec's 
as  viewed  by  a  pair  of  eves,  is  an  important  sou>ce  of 
dep.h  information  for  humans  This  phenomenon,  called 
binocular  stereopsis  can  be  used  by  computers  to  locate 
obiects  m  3D  The  difficulty  faced  in  the  implementation  of 
stereo  is  the  task  of  identifying  corresponding  locations  in 
ihe  two  images  for  a  survey  on  s:ereo  consult  [H 

Our  noel  is  to  improve  the  performance  of  stereo 
matching  algorithms  We  take  the  case  of  lnie-by-line 
edge  based  stereo  and  propose  two  ways  of  improving 
their  performance 

1  Given  the  output  of  a  stereo  algorithm  ie  the 
sparse  map  of  Ihe  disparities  obtained  at  the 
matched  edge  points  we  wish  to  detect  and  correct 
errors  made  by  the  matching  program  We  classify 
the  type  of  matching  errors  into  two  categories 
Usir.y  a  strict  geometric  constraint  on  the  distribu¬ 
tion  of  disparities  along  linear  segments  we  are  able 
to  identity  and  correct  one  type  of  error  Also  m  the 
process  we  are  able  to  fill  m  disparities  tor  edges 
which  have  not  been  matched  end  some  occluded 
edges 

2  We  wish  to  improve  the  performance  of  th«  stereo 
program  before  the  correction  phase  The  quality  of 
a  map  furry  algorithm  can  be  tudgad  B»  me  amount 
it  errors  .|  mates  Being  aole  to  defer  l  >n«  '.pa  or 
"MP  n  og  error s  we  compare  the  per*orman<  e  of 
. arums  match  evaluation  functions  which  could  be 
i-.ed  ov  the  stereo  algorithm  and  moose  rhe  one 
which  gives  me  least  amount  of  'natr  hung  errors 


2  FEATURE  BASED  STEREO  ALGORITHMS 
Most  feature  based  stereo  algorithms  use  edges  as 
the  feature  primitive  to  be  matched  Edges  heve  an  extert 
of  one  pixel  Using  an  epipotar  geometric  constraint  for 
stereo  edges  lying  on  an  epipoler  line  are  matched  only 
to  edges  lying  on  the  corresponding  epipolar  line  m  the 
other  image  This  type  of  stereo  we  term  as  line-by-line 
sterec  The  main  benefits  of  this  tt-bnique  are 

-  The  direct  application  of  epipoler  geometry  forces  a 
strong  constraint  on  the  search  space. 

-  Global  optimality  has  to  be  maintained  only  among 
edges  on  one  epipoler  line  These  are  much  fewer 
then  the  edges  m  the  whole  image  and  hence,  need 
less  computation 

-  All  the  epipoler  '  net  can  be  processed  m  parallel 

-  Tfia  ordering  among  odges  on  an  epipi'ar  line  is  ob¬ 
vious  and  natural  This  >s  useful  lor  dynamic  pro¬ 
gramming  based  algorithms 

-  aii  features  (edges)  are  of  equal  length  end  the 
match  along  ihe  epipoler  line  is  across  then  fuii 
length  Metchmo  of  features  thus  does  not  have  to 
be  norm-need  for  lengths  or  amount  of  overlaps 

Intensity  -1i»cont:nuities  in  images  correspond  to 
□hysicaf  ‘eaturns  in  tne  imaged  scene  These  features 
ou'd  be  surface  boundaries  or  surrace  markings  These 
boundaries  are  connected  foremq  the  following  continuity 
constraint  on  disparities 

Disparity  along  a  boundary  chanqes  smoothi  4 
\  <?  *here  should  ce  no  disparity  discontinuities 
along  a  be  unde.- y 


>  wr  A  \  .vf'l  '  f'  I 
c|*»»f  «  •  .'-I* 

•  J  » 


A  lifnr  *1  ifevejt  h  ,  ,0  1  y 

4,‘  •  '•*,  -J  *  1  i#»  ‘  R* 

-  Aka*  r  «>  .  All* 


>n  un*»  D/  tin#*  •  epipour  nr*>s  pit  e'-sfli) 

«r(Jep«*n<jenfly  of  e«jt.h  oth^r  ronn#»*  -  ivt/  ntoftn^iion 

‘.onuined  sir  jrrenn  is  not  uswJ  arnj  •;?•*  •*•1? ,  on 

ST r,)ifi(  i^  not  ;ppt'CU 


The  following  methods  have  been  proposed  to  uso 
the  continuity  constraint  for  line-by-lme  stereo 

1  Use  disparity  information  from  already  matched 
neighboring  epipolar  lines  to  guide  the  mstcnmg  [21 
This  tachniti*-.*  is  used  mostly  as  a  tool  to  cut  com¬ 
putational  custs  by  limitmg  the  search  space  The 
maior  drawbacks  with  this  technique  are 

a  Wrong  matches  m  a  line  can  be  forced  oy 
wrong  matches  in  the  neighboring  lines  As 
this  line  will  now  be  used  to  guide  matching  m 
other  lines,  errors  become  cumulative 
b  The  assumption  made  is  that  disparity  changes 
slowly  which  is  a  stronger  statement  than  say¬ 
ing  that  it  changes  smoothly  Also  this  as¬ 
sumption  does  not  hold  in  general 
c  Directional  biasing  The  results  depend  on 
whe  her  the  images  are  processed  top  to  bot¬ 
tom  or  the  other  way  around. 

2  Use  the  continuity  criteria  after  the  complete  match¬ 
ing  process 

For  example 

a  Arnold  [31  chooses  a  suboptimal  match  which 
meets  the  continuity  criterion  (based  on  some 
statistics)  bettor  than  the  optimal  one. 
b  Adjacent  edges  having  disparity  difference 
greater  than  some  value  based  on  on  some 
statistics  signal  an  error  and  a  cooperative 
process  is  used  to  detect  and  reiect  the  wrong 
match  by  Baker  [4|. 

In  these  algorithms,  the  measure  used  does  not  fol¬ 
low  from  the  geometric  implications  of  continuity 
but  roues  on  having  nearly  same  disparity  along  a 
boundary 

3  Use  continuity  constraint  for  search  in  a  3D  search 
space 

Ohta  and  Kanade  [51  use  dynamic  piogrammmg  m 
three  dimensions  for  mtra-scanline  search  to  find 
consistency  among  scanlmes  Their  technique  <% 
computationally  very  expensive  so  only  a  few  con¬ 
nected  contours  are  used  It  is  also  not  clear  if  op¬ 
timizing  a  path  in  the  search  space  based  on  some 
cost  will  always  maintain  three  dimensional  con¬ 
tinuity 

The  above  techniques  do  not  propose  disparities  for  edges 
not  matched  or  reiected  as  incorrectly  matched  and  for 
occluded  portions  of  segments  Surface  interpolation 
techniques  might  be  used  to  fill  in  disparities  but  such 
methods  do  not  use  the  continuity  constraint  along  boun¬ 
daries  The  continuity  constraint  has  also  been  used  for 
other  type  oi  stereo  algorithms  for  example  [61  and  if| 

Edges  can  be  linked  up  into  contours  called 
segments  These  contours  can  be  of  arbitrary  shape  to 
reflect  the  shape  of  the  underlying  teature  or  they  can  bo 
line*.*  approximations  lo  it  The  latter  type  of  segments  are 
termed  as  linear  segments  and  have  been  used  as  feature 
primitive  tor  ste.eo  mate  hint)  by  Medioni  and  Nevatia  181 


The  matching  of  segments  mcorporetes  boundery 
connectivity  directly  A.'so.  segments  are  more  complex 
features  than  edges  ana  thus  contain  more  information 
which  could  be  used  to  give  more  precise  match  evalua¬ 
tions  than  for  edges  The  segment;,  can  be  as  short  as  a 
single  edge  so  egmen;  based  st.irec  does  not  break 
down  m  the  presence  of  isolated  edges  (51  On  the  othei 
hand,  such  isolated  edges  should  be  treated  with 
suspicion,  given  the  connected  nature  of  physical  features 
The  problems  inherent  to  segment  based  stereo  matching 
algorithms  are: 

-  The  problem  is  in  identifying  those  properties  of 
segments  which  can  be  used  in  matching,  and  in 
deciding  how  these  properties  can  contribute  to  the 
confidence  we  can  havo  in  the  matches  Medioni 
and  Nevatia  [B|  use  segment  orientation  and  con¬ 
trast.  and  difference  of  disparities  among  neighbor¬ 
ing  segmen's  for  the  evaluating  a  match 

-  The  use  of  epipolar  geometry  is  not  as  straightfor¬ 

ward  as  m  hre-by-ime  stereo  Although  a  match  for 
a  segment  can  be  restricted  to  a  window  bounded 
by  epipolar  lines  passing  through  its  end  points,  we 
cannot  ensure  that  a  matching  segment  can  be 
found  which  exactly  fits  this  window  Matches  for 
only  parts  of  the  segment  are  found  The  matched 
segment  often  extends  beyond  the  epipolar  window.  j 

The  lengths  of  a  segment  and  its  match  may  di*fer  a  j 

lot  A  segment  may  find  more  than  one  segment  i 

which  it  can  match  These  matching  segments  may  ! 

be  consistent  in  that  they  do  not  overlap  along  | 

epipolar  lines  (i.e.  along  any  given  epipolar  line  the  t 

segment  in  one  image  has  only  one  match  in  the 

other  image)  but  they  may  propose  dirferent  dis¬ 
parities  for  the  segment  they  match  All  these  issue' 
complicate  the  task  of  evaluating  .■  match  Many  of 
these  problems  stem  from  the  poor  quality  of  seg¬ 
ment  detectors 

-  Global  optimality  has  to  be  found  among  all  the 

segments  in  the  image  The  number  of  segments  in  ■ 

an  image  is  muen  larger  than  the  number  of  edges  • 

along  an  epipolrr  line  so  insuring  global  optimality  is  I 

more  expensive  * 

• 

3.  ERRORS  IN  LINE-BY-LINE  STEREO 

it  is  our  observation  that  in  any  ime-by-ime  stereo,  " 

a  lot  of  edge*  are  assigned  wrong  matches  Wrong  i 

matches  could  spurious  null  matches  or  matches  to  a 
wrong  edge  edges  might  not  be  matched  because  the 
match  evaluation  function  used  by  the  s*ereo  matcher  as¬ 
sumes  that  the  edge  is  spurious  or  that  its  matching  edge 
m  th‘*  other  image  is  missing  or  that  the  edge  is  occluded 
m  the.  other  image  Wrong  edge  matches  are  of  the  fol¬ 
lowing  two  types 

Type  I  (local)  errors 

in  figure  I  the  edge  pairs  matched  by  a  stereo  algorithm 
are  shown  linked  by  the  epipolar  line  The  ontours 
represent  the  segments  detected  >n  the  image  The  figure 
shows  th.it  more  edges  of  the  segment  ar«»  a  >sign*»<l 
matches  to  me  correct  segment  CL)  than  to  my  single 
wrong  segment  These  wrong  matt  ties  arise  due  »o  fne 


4 


a- 

1-7 


iv* 

tV 

f . 
>'/ 
K* 

*  : 

K 

u 


ir 


» 


r-* 


tact  trial  information  along  a  single  row  of  pixels  mav  rot 
be  good  basis  tor  matching  end  so  there  will  be  some 
wrong  matches  on  an  epipolar  >ine  Also  the  real  epipolar 
line  could  be  locally  distorted  due  to  imaging  device  and 
conditions  [ 6 1  or  due  to  noise. 


Type  I  errors  can  be  detected  and  corrected  on  the 
basis  of  tha  continuity  constraint 

Type  II  (global)  errors: 

Figure  2  shows  another  type  of  wrong  match  Here  more 
edges  ot  the  segment  AB  are  assigned  matches  to  a 
wrong  segment  EF  than  to  the  correct  segment  CD  We 
term  such  errors  as  Type  II  errors.  These  errors  reflect 
that  the  function  tor  evaluating  the  quality  ot  match  used 
by  the  stereo  algorithm  prefers  the  wrong  segment  as  a 
better  match  This  is  a  drawback  of  the  match  evaluation 
function  We  dc  not  believe  that  any  single  evaluation 
function  can  always  avoid  Type  II  errors 

Type  II  errors  can  not  be  detected  (or  corrected!  on 
the  basis  ot  the  continuity  constraint.  In  tart  ell  the  stereo 
algorithms  winch  use  tigural  continuity  along  segments 
including  segment  cased  stereo  can  directly  deal  with 
only  type  I  errors 

This  classifications  ot  errors  holds  tor  all  type  ot 
segments  However  from  this  pent  on  we  will  ue  dealing 
exclusively  with  linear  segments 

The  more  information  we  use  to  evaluate  *he  good¬ 
ness  of  a  match  between  two  edges  the  less  matching 
errors  we  should  nave  However  using  more  information 
means  ncreaued  computational  expense  and  the  benefits 
■  n  terms  o!  reduced  error  m.jv  not  be  proportional  to  the 
*».im  -  ost  Thus  a  cheap  way  to  deled  and  correct  er'ors 
would  fje  useful  n  .mpionnq  toe  perfori*  ar.ee  ot  simple 
-■  y  af  lull  on  'uni  ruins 


Also,  we  do  not  know  all  types  of  information  v  hich 
could  be  used  to  evaluate  edge  matches  Even  with  the 
known  metrics  for  fudging  the  quality  of  a  match  it  not 
clear  as  to  how  much  importance  should  be  attached  'c 
different  measures  Nor  do  we  know  much  about  tha 
variation  in  performance  of  the  measu'es  due  to  changes 
in  the  images  This  situation  is  reflected  in  the  tact  that 
the  various  line-by-line  stereo  algorithms  in  existence  use 
different  cost  functions  One  reason  for  this  state  of  affairs 
is  the  difficulty  in  measuring  quantitatively  the  perfor¬ 
mance  ot  a  stereo  algorithm  ane  so  not  much  work  has 
been  done  on  the  comparison  of  the  performance  of 
various  evaluation  functions  In  section  7  we  shall  com¬ 
pare  a  few  cost  functions  used  for  line-by-lme  stereo  al¬ 
gorithms 

4.  DISPARITY  CHANCE  ACROSS  LINEAR  SEGMENTS 

Lei  us  now  consider  the  special  case  of  linear  seg¬ 
ments  A  linear  segment  in  an  orthograpnic  or  perspective 
proiection  ot  a  3D  scene  is  the  image  of  a  linear  feature 
(or  a  linear  anprosimation  of  a  feature)  in  the  3D  scene 
ruling  out  accidental  alignments  In  a  stereo  image  pel' 
even  suen  accidental  alignments  in  one  image  will  be 
revealed  n  the  other  image 

Deptn  changes  linearly  along  a  lira  ght  line  in  30 
Consider  fi  ui «  3  oi  me  jrthographic  projection  oi  a 
straight  line  in  tha  scene  onto  the  image  plane  Ix-y  planet 
since 

-  (!) 

I  c 

we  have 

*.ABC  ■  ACDE 

and 

•  ‘  71 


A 


| 

r 

i 

k 

°i 


I  i 


W«  also  nota  har«  that  it  is  obvious  from  th«  equation  of 
a  line 

I', 

r  IT) 

Vl  V'l  Y'i 

that  the  depth  is  linearly  proportional  to  the  d'Spiacement 
aiong  the  x-axis  lor  the  y-axisj 

In  stereo,  matrhinq  gives  us  the  disparity,  not  the 
depth  In  the  image. .the  protection  is  perspective  rather 
than  orthographic  Therefore,  vve  can  not  directly  use  the 
relationship  derived  in  equation  {2)  Consider  figure  4 
showing  a  linear  segment  in  a  sterao  pair  As  is  obvious 
from  figure  4.  disparity  changes  linearly  along  the  linear 
segments  irrespective  of  the  type  of  protection,  camera 
geometry  etc.  We  will  use  this  strong  constraint  to  detect 
and  correct  wrong  disparity  values  due  to  bad  matches 


Figure  1:  Orthographic  protection  of  a  straight  line 


Figure  4  Mai'  rung  linear  segment  m  stereo  ;)e*r 


5.  DETECTING  AND  CORRECTED  TYPE  I  ERRORS 
We  shall  use  :he  connectivity  constraint  ‘or  linear 
segmonts  to  maintain  inter-epipolar  line  consistency  In 
section  4  we  showed  that  disparity  along  a  linear  segment 
varies  linearly  We  use  this  strong  constrain;  to  oetec. 
Type  l  errors 

We  work  m  a  length-disparit;  (l-t!  axis)  coordinate 
frame  First  a  linear  segment  is  picked  The  disparities  ob¬ 
tained  by  the  stereo  matching  program  for  tha  edg»s 
L-alonging  to  this  sagmant  art  plotted  as  a  function  of  the 
distance  of  the  edge  from  one  end  of  the  segment.  Since 
disparity  varias  linearly  along  a  linaar  sagmant,  if  all  edges 
are  matched  correctly,  all  tha  points  plottad  should  fall  on 
ona  straight  lina.  as  shown  in  hgura  5 


Ungfh  i  — 


Figure  5:  Plot  of  ideal  disparity  versus  length 
on  e  segment 

Figure  6  however  is  a  more  typical  output  if  we 
cen  interpolate  e  line  through  this  date  which  would  cor¬ 
respond  to  the  correct  disparity  plot,  we  co*dd  not  only 
detect  end  correct  wrong  matches  but  also  fi  i  *n  disparity 
information  at  edges  which  were  not  matched  end  to 
edges  belonging  to  occluded  portions  of  the  segment 

A  tredit.onal  method  to  fit  e  straight  line  through  e*- 
penmentaJ  date  with  errors  is  to  interpolate  a  line  which 
minimises  the  square  of  the  errors  However,  we  cannot 
directly  apply  thu  here  because  the  leas: 

squared  error  line  is  valid  only  when  the  distribution  of  er¬ 
rors  about  the  correct  values  is  gaussian  In  the  pic»  of 
disparities  along  a  segment  this  does  not  hold  Since  a 
wrong  match  means  that  the  edge  was  matched  to  an 
ftCqe  belonging  to  a  segment  <n  the  neighborhood  of  the 
correct  segment  the  distribution  of  wrong  matches  ate 
usually  •  trongly  biased  to  one  s*de  of  the  '  orrect  /a»ue 
f-or  enampie  .t  the  segment  ha  high  disparity  then  most 
of  the  wrong  matches  give  a  low  disparuy  land  vice 

ver\ai  Also  because  o»  the  spacing  between  !ht  seg¬ 
ments  the  amount  of  error  n  disparity  caused  by  a  wrong 

match  is  usually  -arge  As  the  ieast  -,quar*d  er.rjr  line  uses 
the  square  of  the  errors  lor  i.«er  odation  even  «  lew 


I 

» 

« 

* 

j 

i 

i 

i 

j 

4 

| 


wrong  matches  used  tor  interpolation  could  pull  the  line 
substant'-rlly  otf  the  to  rect  position  Fijcbler  and  Boiles 
[91  have  presented  j  technique  to  deal  with  erroneous 
data  we  will  use  another  method  more  suited  to  our  task 
domain 


Figure  6:  Plot  of  real  disparity  versus  length 
or.  a  segment 


We  have  to  throw  away  the  points  plotted  due  to 
wrong  matches  before  we  can  interpolate  e  line  through 
the  disparity  values  If  more  edges  ere  matched  to  the 
correct  segmen-  then  to  any  other  single  wrong  segment 
(Type  i  error)  then  we  can  detect  the  errors  in  the  follow¬ 
ing  way  We  tit  thin  strips  of  all  orientations  ant,  locations 
to  the  points  from  the  definition  ot  type  I  errors  it  fol¬ 
lows  that  the  strip  whi-.h  has  the  mammum  number  ot 
points  m  it  has  has  the  disparity  values  corresponding  to 
ihe  correct  matches  the  rest  are  erroneous  The  reason 
lor  housing  ihm  strips  instead  ot  urai'jht  \nes  is  ih„;  due 
in  rhe  discrete  location  ot  the  edges  ard  the  error  n  the 
lo'ut'on  of  edges  all  the  points  from  correct  maltha,  do 
not  fall  exactly  on  a  line 

t rj  in  the  thin  strips  to  the  data  we  us#  a  version  of 
the  motj.hed  Houqn  •ransform  technique  proposed  ny  Wat 
t.K  i*  j  toj  Consider  Injur*  7  The  wmflnw  ABCD  of  the  d-l 
pi.** e  ,  totrfins  all  (he  plotted  disparity*  The  (h.n  str.o 
bited  t»n  the  correct  data  points  has  tc  intercept  AB  and 
(,1)  Wh  i  hoosr  the  intercepts  i,  mfl  i,  made  by  ’he 
* uniting  through  the  renter  of  the  strip  at  AB  and  CD  as 
the  parameters  for  the  Hough  transform  The  width  jf  'he 
ti  p  will  depend  on  t«*e  ".i/e  and  type  of  filters  used  for 
„[]«!♦*  it.tt-cif.fi  if  the  edge  detector  ran  not  resolve  he 
i.vi  edges  less  than  p  pt«el%  apart  then  if  a  **dth 
,l  the  ,tr  p  has  to  be  ie»'»  than  <’p  We  use  a  strip  ih**e 
%ule  and  «.e  alio.*  an  overlap  '•  f  a  p**e<  between 
i  j,.,,  fin  ;; ar atl*o  .inos  We  w«H  then  have  to  <  ‘insider 
>!».».  ml/  it  »*  -  **  r  /  jther  pi«el  along  and  C-2 

•  ■  an 1 } e  it  jf.(.anl«  m  the  image  •  d  Then  tne 

•  ,  t;  i  p  u>e  M-jij<jh  'ranstnffti  is  d 

mat.  tiifs  'MyH  t.**en  gentd»ed  .ve 
,nt.  on  ,,,«jh  tr.eir  lispannes  *»um 


this  line  we  csn  get  the  disparities  of  ail  the  edges  on  me 
segment  These  new  disparities  are  stored  as  teal  numbers 
si  ace  we  can  get  subpixel  disparity  values  We  also  store 
the  disparities  at  the  begin  and  the  end  points  of  seg¬ 
ments  for  display  purposes  This  process  is  repeated  for 
all  linear  segments  m  the  image. 


♦ 


Figure  7:  Thtn  strip  containing  the  correct  disparities 

At  this  stage  we  are  not  able  to  Jstect  type  II  er¬ 
rors  inforr.  .ation  from  other  sources  which  provide  depth 
information,  such  as  monocular  cues,  shape  from  Shading, 
smoothness  constraints  on  surfaces,  another  stereo 
marcher  etc  might  indicate  that  the  disparity  obtained  at 
some  of  the  segments  is  wrong  thus  detecting  type  II  er¬ 
rors 

6  QUANTITATIVE  COMPARISON  OF 
MATCHING  ALGORITHMS 

it  will  be  very  useful  to  ha.re  a  quantitative  measure 
of  the  performance  of  a  stereo  algorithm  This  measure 
coutd  be  used  during  the  development  phase  of  a  stereo 
algorithm  Say  for  example  a  hne-hy-ime  st  reo  algorithm 
is  Being  developed  end  the  match  evaluation  function  uses 
some  weights  and  ♦hreshoids  The  measure  of  perlor 
mane*  could  be  used  to  fine  tune  the  weights  and 
.hr**\ho»ds  ’or  a  parfM  uUr  cU*s  ot  .mages  Belter  yet  if 
ih**  -j*m for manr.e  measine  •  Ould  t>e  calculated  r.tieaply  ami 
automatically  it  r  ouid  be  used  tc  dynamically  < hange  the 
/.eights  and  thresholds  *s  diftrrent  parts  ot  an  image  are 
processed 

So-ne  stereo  algorithms  are  designed  for  specific 
image  i  lasses  i;fce  rolling  'errauts  whn*»  others  are 
designed  to  be  general  l  *eri  a  general  pun  stereo  rf 
gntnhm  may  enhit)il  a  tnas  t  iwaids  %<  j  rt  *  •  *  .uiuuiar  type 
0l  imagery  due  m  assumptions  made  u»  «ts  design  ' 
smoothly  varying  Mirfai.es  m;  order  reversal*.  et»  ,  Ap 
piK  atU/i  s  if  stereo  usually  i  >e  u  «?Main  »/j>e  >f  .mages 
i<r  y>rfiH|jlH  maPt.in  j  w‘Wl’1  use  ieri.il  uu.igel  *  *.*'"e 

fohntii  .v  > •  j  1  1  i*e  ‘  ene\  ph-g.,  jr.|l)ll*"f  «l  •  h  ',f 

|i  .tan*  **s  1  ,»». j  %  ,^i  1  rtui'i  m  1  t * «* *  1  •  ■  cl  1  .j!l  »i .*  ii 

..ill]  .  h. ,« 1  »*  fr.  ,r*i  1. aii.tt  n*  »  .»ig  .m'hi"  n»e 

v4- ■  i  *  u»t-»  I  *  •  * s  '  1  ■  ►  ?  .m  t  i- 


l  *. 


*•  -> 


i».-J 


t  - 

L, 


•  •  i< 


c  -a  -  :"jr  rv  raiuut 


IKTCD IWJ  JL  JUJJUJK-  J  '■*> 


^ 

s 


.1 
i 


An  obvious  performance  meaou.e  -.vculd  be  the  per¬ 
centage  ot  matching  errors  and  the  percentage  of  edges 
matched  The  percentage  of  cl-.rs  matched  could  be 
easily  obtained  from  the  stereo  algorithm  itself  Used 
alone  it  could  be  misleading  as  r  would  not  be  kn'  .  . 
how  many  of  those  matches  were  incorrect  To  calculete 
the  percentage  of  errors,  we  need  an  algorithm  to  consis¬ 
tently  identify  the  wrong  matches  As  we  nave  seen  n 
section  5.  we  can  locate  Type  I  errors  consistently 

The  algorithm  :o  calculate  the  percentage  error  is: 


7.  COMPARISON  OF  A  FEW  COST  FUNCTIONS  FOR 
UNE-BV-LINE  STEREO 

We  take  a  typical  line-by-line  stereo  algorithr  using 
dynamic  programming  to  compute  the  best  matched  edge 
sequence  along  an  ep.puiar  line  i  e  the  sequence  which 
has  the  minimum  total  cost  of  matching  edge  pairs  The 
algorithm  chosen  is  one  implemented  by  Ohta  and  Kanade 
fo'  their  intra-scanline  seerch  and  details  car.  be  found  in 
[5 1  We  evaluate  the  matching  performance  using  the 
match  cost  function  proposed  by  them  against  a  modifica¬ 
tion  of  this  function  and  two  other  functions. 


M 


Cu! 


H  *  1 
*  .  * 


(1)  Link  the  edges  into  linear 
.segaents. 

(2)  Plot  the  disparities  obtained 
along  the  length  of  the  segment. 

<3)  Pit  tit  in  strips  to  them  tc 
identify  the  correct  matches. 

(4)  Accumulate  the  number  of  correct 
and  incorrect  matches. 

(5)  Repeat  steps  2  to  4  for  each 
segment . 

(6)  Percentage  error  •  100*wrong- 
matches/ total-matches . 

One  might  question  the  need  of  evaluating  the  error 
percentage  since  this  algorithm  not  only  detects  erro'S  but 
can  also  correct  them  We  still  need  to  evaluate  the  per¬ 
formance  since 

I  The  more  points  we  have  on  a  linear  segment  with 
the  correct  disparity,  the  more  accu.ately  we  can  in¬ 
terpolate  the  disparity  for  the  full  segment  There¬ 
fore  even  with  the  error  correction,  we  would  prefer 
low  error  stereo  algorithms 

7  Thu  technique  does  not  work  for  very  short  seg¬ 
ments  due  the  small  number  of  pomts  available  We 
cannot  correct  the  matches  on  such  segments  The 
smaller  the  overall  matching  error  is  the  more  con¬ 
fidence  we  can  have  lor  matches  for  edges  along 
these  segments 

3  We  can  detect  only  Type  I  errors  by  Ibis  method 
Type  II  errors  are  also  a  sarious  pmbltm  in  starao 
Howaver  il  more  edges  on  a  segment  with  Type  I 
error  were  matched  incorrectly  it  might  have 
deteriorated  mio  a  Type  H  error  So  a  large  percent¬ 
age  of  Type  l  errors  <s  also  a  good  indication  that  a 
io!  of  Type  il  errors  are  also  present 

4  To#  performance  of  area  based  stereo  algorithms  is 
jsuaily  ’he  poorest  at  intensity  discontinuities  This 
method  cjn  give  an  evaluation  o I  Iheir  performance 
el  the  edges  (which  might  not  hold  for  their  pertgr  - 
mdiii  e  away  from  edges) 

There  are  some  obvious  limitations  of  ifus  evaluation 
algorithm  It  cannot  measure  the  •mount  of  Type  II  errors 
Rnrause  of  this  d  cannot  be  used  to  evaluate  the  perlor- 
'tiam  e  of  stereo  algorithms  w.uoh  match  segments  I8|  (or 
more  compter  leaturest 


Table  I  shows  the  performance  of  this  algorithm 
using  tour  different  match  evaluation  functions  on  the 
stereo  peir  m  figure  8  The  edges  and  the  linger  segments 
heve  been  detected  using  Nevetia  and  Babu  s  linear  seg¬ 
ment  extrjction  algorithm  (til  Figure  9  displays  the  dis¬ 
parities  along  those  segments  in  the  left  image  that  heve 
oeen  processed  by  the  error  detection  and  correction 
hase 

The  following  are  among  the  measures  used  for  the 
luation 

-  Total  number  of  segments  selected: 

We  use  segments  of  length  greater  than  a  set  limit 
to  do  eiror  detection 
Total  number  of  segments  processed: 
rom  the  selected  segments,  only  segments  having 
more  then  a  fixed  number  of  matched  edges,  and 
correct  matches,  are  used  for  the  error  statistics 
(and  for  correction)  (o  ensure  reliability. 

-  Percentage  matched  edge  points  processed 

This  is  a  percentage  ratio  of  the  matched  points 
which  were  on  the  processed  segment;,  to  ell  the 
edges  matched 

-  Fercentcge  error 

This  is  the  percentage  of  matched  points  which  were 
in  error  among  the  matched  points  processed 

-  Percentage  edge  points  corrected 

Number  of  edgs  points  which  were  either  corrected 
or  had  their  disparities  filled  in  from  all  the  edge 
points  on  the  processed  regments 

The  match  evaluation  functions  compared  were 

-  Cost  Function  I 

This  is  the  cost  function  used  by  Ohta  and  Kanade 
and  its  details  can  be  found  m  (S) 

-  Cost  function  II 

To  function  I  we  acid  the  restriction  that  matches  are 
considered  only  among  edges  whose  direction  does 
nr.t  differ  from  each  other  by  more  than  30  0 
degrees  The  direc.ions  used  used  are  the  orien¬ 
tation  of  the  segment  the  edge  belongs  to  and  not 
local  edge  orientations 

-  Cost  function  III 

This  function  is  formulaled  to  fsvor  matches  among 
edges  with  similar  orientations  anq  with  similar  in¬ 
ter. el  lengths  on  their  left  between  them  and  the  r 
preceding  match 

ccsl-iexpdfi-gjlr.,)  c?»i|a,-aj  i  (a,  *a2i:  c  3>J  (a,*af) 

W*  ere  0  are  the  edge  onenutions  a  ihe  interval 


r 

•v.J 


■s' 


o 


lengths  and  c  constants 
Cost  Functio"  IV 

For  this  (unction  we  use  the  intervals  lying  on  the 
left  of  the  two  edge  points  Ths  cost  of  match. ng  the 


two  interv..s  is  the  best  cost  o(  matching  them 
using  an  area  based  stereo  algorithm  The  area 
ba*ed  stereo  algorithm  used  is  one  propos'd  by 
Leguillou*  [121 


Table  1:  Performance  of  cost  function*  on  image  I  Table  2:  Piiformanci  of  co*t  function*  on  image  II 


Figure  9:  Disparity  value*  at  processed  segments 


Tr  i  performance  of  these  algorithms  for  the  image 
pair  in  figure  10  are  listed  m  table  2  figure  t  I  shows  the 
dispannes  along  the  processed  segments  of  the  'eft  im¬ 
age 

In  the  comparison  ol  the  C.ur  match  evaluation 
functions  it  i*  clear  that  function*  I  and  ll  perform  ap¬ 
preciably  better  than  functions  Ml  and  iV  but  among  func¬ 
tions  I  and  M  it  *s  difficult  ro  choose  thi.  winner 


8  CONCLUSION 

The  proposed  algorithm  is  linear  in  the  number  of 
edge*  processed  Therefore,  it  it  a  very  cheap  wav  of  in¬ 
suring  figurai  continuity  along  segment*  Although,  the 
algorithm  does  not  use  the  continuity  conttramt  during 
the  matching  procass.  wa  Delia*#  that  if  a  substantial 
number  of  the  edges  can  be  processed  by  this  algorithm, 
wa  can  afford  to  ignor#  the  other  matched  edge*  Even  if 
we  use  the  corrected  disparities  a*  heed  constraints  for  a 
second  pass  over  the  image  pa*r  the  algorithm  would 
both  be  computationally  cheaper  and  would  use  more 
continuity  constraints  than  LSI 

we  have  been  able  to  demonstrate  that  wa  can 
process  •  large  number  of  the  matched  edges  and  detect 
type  I  errors  m  them  We  can  also  correct  these  errors  and 
fill  m  dispaniies  for  edges  not  matched  However  this  al¬ 
gorithm  <s  weak  m  the  following  areas 

-  Not  all  matched  edges  are  processed  This  is  due  to 
me  fact  that  at  each  segment  need  some  min¬ 
imum  number  of  correct  matches  before  we  can 
confidently  interpolate  disparity  for  the  whole  seg¬ 
ment 

-  The  problem  of  type  M  errors  has  not  been  ad- 
'Ires.eJ 

The  ne*t  step  in  Ih.s  research  wilt  be  to  use  me  cor¬ 
rected  disparities  as  nput  constraints  for  reprocessing  ihe 
\rer»»o  pair  using  the  same  or  a  different  stereo  algorithm 


/  s  ✓  / 


Reference* 

1  Barnard  5  and  Fischlar  M.  "Computetionel  Stereo." 
ACM  Computing  Surveys,  Vot  14.  No  4. 
DicimMi  1982  pp  553-922 

2  Henderson  Robert  L .  Miller  waller  j.  end  Grosch. 
C  B  Automatic  Stereo  Recognition  of  Man-Made 
Tergetv  Society  of  Photo-Opt lea  1  In- 
trumentation  Engineers,  Vol  18 6, 
Digital  Processing  of  Areial  Images, 
August  1329 

3  Arnold  R  Automated  Stereo  Perception."  Tech 
report  No  STAN-CS-83-961  Stanford  University. 
Computer  Science  Department  Marc  ft  1983 

4  Baker  H.  Depth  From  Edge  and  Intensity  Based 
Stereo  Tech  report  No  5T4N-CS  •  82 -930  Stan¬ 
ford  University  Computer  Science  Department. 
September  1982 

5  Ohta  Y  and  Kanada  T  Stereo  by  intra  and  Inter- 
scanlme  Search  Using  Dynamic  Programming 
Tech  report  CMU -CS -83- 162  October  1983 

6  T'lmion  WEL.  Computational  Experiments  with  a 
Feature  Based  Stereo  Aig-irithm  Pattern 
Analysis  and  Machine  Intelligence,  Voi 
/  No  1  1985 

2  Mayhew  JE  W  ani'  Frisby  JP.  Psychophysical  and 
Computational  Studies  towards  a  Theory  of  Human 
Sicreopsis  Artificial  Intelligence,  1581. 
pp  149-385 

8  Medioni  G  and  Nevada  R  Segment  -  based  Stereo 
Matching  Proceedings  of  DARPA  Ir.age 
Understanding  Workshop,  Washington  DC 
June  1983 

9  Ftschlei  MA  and  Bodes  RC  Random  Samples 
Consensus  A  Paiadigm  lor  Model  Filtering  with 
Applications  to  Image  Analysis  and  Automated 
Cartography  Communication:;  of  the  ACM, 
Vol  34  No  n  June  i9HI  pp  181  -.  19b 


10  Wallace  RS  A  Modified  Hough  Trensform  for 
Imei.  Proceedings  of  IEEE  Computer 
Society  Conference  on  Computer  Vision 
and  Pattern  Recognition,  San  Francisco 
June  1985 

11  Nevetia  R  end  Bebu  Kfl  linear  Feature  Eatrtction 
end  Description."  Computer  Graphics  and 
Image  Processing  ,  vot  13.  1980.  pp  252-269 

12  Le  Guuioua.  Yenn.  Determination 

automat igue  du  mouvement  dans  une  se¬ 
quence  d' images.  Interet  pour 

1 '  inter  pretat  l  on,  PhD  dissertation.  Ecolo 
Netionale  Superierure  Des  Telecommunications 
June  1984 


•  5* 


442 


GEOMETRIC  GROUPING  OP  STRAIGHT  LINES 


R.  Weiss,  A.  Hanson,  E.  Rif  man 
Computer  and  Information  Science  Department 
University  of  Massachusetts 
A  inherit,  Massachusetts  01003 


Abstract 

Thia  paper  preaenta  a  new  approach  to  the  extraction 
of  straight  tinea  baaed  on  geometric  grouping.  Zero  croaeing 
poinU  of  the  Laplacian  of  intensity  imagea  and  the  gradienta 
at  thoae  pointa  are  uaed.  Theae  edgea  are  innut  to  a  hier¬ 
archical  Unking  and  merging  algorithm.  Edgea  are  linked 
baaed  on  both  intrinaic  and  geometric  properitiea,  e.g.  if 
their  gradienta  are  aimilar,  their  contraata  are  simi'ar,  and 
their  endpointa  are  cloee.  In  the  merging  proceaa,  if  a  se- 
quence  of  linked  edgea  can  be  approximated  sufficiently  well 
by  a  alraignt  line,  thes  <7  are  grouped  and  are  replaced 
by  longer  straight  lines,  tie  hierarchy  allows  us  to  repre¬ 
sent  Unea  at  multiple  scales,  but  does  not  involve  smoothing 
the  image.  There  are  four  advantages  of  this  approach  for 
extracting  etraigbt  lines:  1.  It  links  collinear  line  segments 
even  when  they  are  separated  by  gape.  2.  Low  contrast 
Unea  may  be  found  as  easily  as  high  contrast  lines.  3.  It 
is  leas  sensitive  to  texture  than  aero  crossing  contours  when 
extracting  boundary  Unea.  4.  It  can  find  Unea  which  may 
_7t  be  straight  locally  but  are  straight  at  a  larger  ocale. 

1.0  INTRODUCTION 

The  extraction  of  lines  baaed  on  either  significant  in¬ 
tensity  changes  or  per;e:ved  boundaries  between  areas  is  a 
difficult  and  important  step  in  image  understanding.  This 
paper  presents  a  new  approach  to  the  extraction  of  straight 
Lines  based  on  geometric  grouping.  The  primary  goal  is  the 
extraction  of  straight  lines  from  images  in  which  there  are 
(ragu-cited  intenaity  discontinuities.  The  secondary  goal  is 
the  demonstration  taat  the  use  of  geometric  organisation 
can  be  an  important  part  of  the  Ure  extraction  process  and 
therefore  can  produce  improvements  when  combined  with 
standard  edge  detection  techniques. 

Our  view  of  the  task  of  image  understanding  is  that  it 
is  based  on  a  process  of  organising  events  in  an  image  or 


sequence  of  image*  into  structures  which  can  be  matched 
with  models  of  object*  in  tbs  physical  world.  A  postulate 
of  this  paper  is  that  this  organisation  process  uses  both 
geometric  and  intrinsic  properties  of  structures  in  the  image. 
In  particular  we  apply  this  to  straight  lines. 

A  straight  line  is  not  just  a  local  event.  Figure  1  shows 
some  examples  of  image  events  which  are  straight  lines  at 
some  scales  but  not  at  others.  In  this  paper,  a  straight 
line  is  defined  geometrically  by  the  property  that  it  is  com¬ 
posed  of  a  sequence  of  line  segments  which  are  approxi¬ 
mately  coUine-r  and  that  each  segment  is  doss  to  its  suc¬ 
cessor.  Both  of  thess  criteria  depend  cn  scale;  long  lines, 
for  example,  can  be  separated  by  a  larger  gap  than  small 
once  and  still  be  close.  The  intrinsic  property  which  de- 
foes  straight  lines  is  that  the  intensity  gradients  must  be 
similar  in  magnitude  and  direction  iloi  j  the  line.  There 


c)  0000000000000  o 


Figure  1.  Events  which  can  be  perceived  as  a  straight  line. 


are  other  possible  definitions  of  straight  lines,  which  would 

lead  to  dr'Terent  results.  For  example,  Figure  lc  shows  doU 
which  are  perceived  as  a  straight  line,  but  not  on  the  basis 
of  gradient  information. 

In  Section  2,  we  describe  the  algorithm  used  to  find 
straight  lines  as  we  have  defined  them  and  in  Section  3  we 
present  some  of  the  experimental  results. 

3.0  DESCRIPTION  07  THE  ALGORITHM 

The  two  major  components  of  the  algorithm  for  extract¬ 
ing  straight  lines  are  edge  detection  and  hierarchical  group¬ 
ing.  Hierarchical  grouping  has  two  steps  which  are  {.er- 
formed  at  each  level:  linking  and  merging. 

2.1  Edge  Detection 

There  are  many  edge  detection  algorithms  which  migl/ 
be  used.  The  main  requirements  are  that  the  algorithm  pro¬ 
duce  measurements  of  the  intensity  contrast  and  direct  too 
of  the  edge.  The  two  algorithms  which  we  bar*  used  for 
selecting  points  are  sero  crossings  of  the  Laplacian  operator 
and  a  directional  edge  operator  based  on  ths  work  of  Far- 
alick  and  Canny  [2,4|.  Although  there  may  be  algorithms 
which  have  better  performance  in  some  cases,  the  ones  cho¬ 
sen  are  representative  of  a  class  of  algorithms;  most  of  them 
are  expected  to  have  the  tame  types  of  problems  which  ws 
encounter  here  [3j. 

Calculation  of  initial  edges 

Since  most  of  the  results  have  been  obtained  for  the 
Laplacian  operator,  we  describe  the  processing  for  that  one 
in  more  detail.  First,  the  image  is  convolved  with  a  3x3 
mask  which  approximates  the  Laplacian: 

0  1  0 

1  -4  1 

0  1  0 

The  advantages  of  using  the  Laplacian  operator  are: 

1.  High  positional  accuracy  of  an  edge,  even  with  aliasing. 

2.  Good  sensitivity  to  high  frequency  data. 


3.  Reduction  of  ike  data.  Many  pixels  don’t  produce  edge 

points. 

Next,  sero  crossings  of  the  Laplacian  image  are  detected 
and  their  positions  determined  by  linear  interpolation.  An 
edge,  which  is  a  line  of  length  1,  is  positioned  with  its  center 
at  a  sero  crossing.  If  a  pixel  value  is  sero,  the  adjacent  pixels 
are  chocked,  *o  only  true  sero  crossings  are  considered.  The 
orientation  >f  the  edge  is  determined  by  the  gradient  at  the 
midpoint  of  the  edge,  i.e.  the  edge  orientation  is  perpendic¬ 
ular  to  the  gradient.  Each  edge  carries  with  it  a  magnitude 
and  direction  (signed  gradient  magnitude}  and  each  edge 
(now  considered  as  a  line)  has  an  initial  start  point  and  a 
final  end  point. 

When  the  Laplacian  surface  looks  like  a  saddle  point 
near  the  sero  crossing  set,  then  this  set  will  look  like  a  pair  of 
hyperbolas  c:  a  pair  of  lines  which  cross.  Experiments  were 
performed  in  which  4  edges  were  created  at  these  points 
to  act  as  "'continuation'  sdges  for  those  around  the  saddle 
point.  It  was  found  that  saddle  points  could  be  ignored, 
since  the  geometric  context  provided  enough  information  to 
bridge  the  gaps  when  those  pcinfa  were  omitted. 

The  most  important  'ifference  of  our  approach  trom 
the  usual  one,  which  i?  to  follow  tbe  sero  crossing  contour, 
is  that  it  uses  the  gradient  information  at  the  edge  point. 
Since  the  Laplacian  operator  only  depends  on  second  order 
terms,  the  tangent  to  the  sero  crossing  contour  is  indepen¬ 
dent  of  tbe  gradient. 

Ordinarily  there  are  several  sources  oi  problems  with 
tbe  Laplacian.  Subsampling  of  an  image  and  the  presence 
of  noise  can  result  in  isolated  spots  which  have  a  high  con¬ 
trast  with  their  neighbors.  As  a  result,  the  sero  crossing 
contours  surround  these  isolated  pixels  and  produce  a  point- 
like  structure.  Noise  can  introduce  numerous  local  maxima 
in  the  gradient  of  the  intensity,  producing  multiple  edges 
parallel  to  the  visually  significant  ore.  In  addition  there 
are  sero  crossing  of  the  Laplacian  where  the  gradient  has  a 
local  minimum.  These  are  called  anti-edges,  and  the  current 
implementation  doesn’t  test  for  this  event.  Many  of  these 
problems  can  be  solved  by  smoothing  with  different  width 


gauasian  masks.  Witkin  and  others  bye  explored  this  us¬ 
ing  tcale  space  [5,6j.  The  problem  with  smoothing  is  that 
it  removes  details  which  are  important  for  some  structures. 
For  example,  a  very  thh  line,  even  if  it  is  long,  will  become 
undetectable  if  high  frequency  data  are  filtered  out.  Our 
approach  is  to  use  the  geometric  context  to  eliminate  edges 
which  do  not  form  lines  in  the  image. 

Directional  edge  operators 

As  an  alternative,  the  sero  crossings  of  an  operator  de¬ 
scribed  by  Haralick  have  been  used  instead  of  the  LapJa- 
cian  to  determine  the  uositions,  where  gradients  should  be 
sampled.  The  Horalick  operator  applied  to  an  image  with 
intensity  function  I  is 

V(VT)1  V/ 

We  performed  experiments  in  which  the  gradient  op¬ 
erator  was  approximated  by  a  1x2,  2x2,  and  3x3  masks. 
We  did  not  use  an  approximation  of  the  data  by  polyno¬ 
mial  fuuctlon.  Since  an  image  is  given  by  discrete  data,  it 
is  possible  for  the  gradient  to  change  direction  completely 
without  changing  magnitude,  _o  this  operator  would  not 
detect  ruch  high  frequency  edges.  An  example  is  shown  in 
Figure  2.  There  were  such  problems  with  high  frequency 
data  using  the  Haralick  operator,  consequently  most  of  our 
experiments  were  done  with  the  Laplacian. 

0000000000000000 
1  1  1  1  1  1  1  1  1  11  1  1  1  1  1 
0000000000000000 
1111111111111111 
j'igure  2:  high  frequency  edges 

2,2  The  Hierarchical  Grouping  Pi-ocess 

Ihe  i.oal  is  the  grouping  of  local  edges  or  lines  into 
longer  l'nes,  which  implicitly  defines  a  scale-space  hierarchy. 
At  each  level,  starting  with  edges  which  are  one  pixel,  first 
Unking  is  performed  and  then  merging  to  produce  lines  in 
the  next  higher  level  in  the  hierarchy. 


Linking 

Linking  of  straight  lines  is  a  search  for  almost  collinear 
pairs  which  are  close  to  each  other.  The  goal  of  this  process 
is  to  reduce  the  sea."  j  for  candidate  sets  of  line?  for  merg¬ 
ing  into  lor  ger  lines.  Lines  within  a  linbin*  rodi'w  of  .ne 
endpoint  of  a  given  line  are  tested  before  linking.  There  are 
four  criter  used  for  linking: 

1  Similar  gradient  magnitude.  The  gradient  magnitudes 
must  be  close  to  each  other.  The  current  system  uses  a 
factor  of  2  as  the  test. 

2  Approximately  collinear.  We  required  that  the  direc¬ 
tions  of  the  two  lines  must  be  within  30  degrees  of  each 
other,  and  that  the  distance  between  the  midpoint  of 

the  second  line  and  the  first  line  be  small.  Lines  180 
degrees  apart  are  not  linked. 

3  Enipoints  must  be  close.  This  is  measured  by  the  pro¬ 
jection  of  the  endpoint  of  on  :  line  segment  onto  the 
other  line.  The  measurement  need  not  be  isotropic,  and 
the  criterion  will  depend  on  the  scale. 

4  Lines  do  not  overlap.  The  final  point  of  the  first  line 
mrsl  be  closer  to  the  initial  point  of  the  second  than  it 
is  to  the  final  prist  of  the  secor  d. 

One  can  view  the  result  of  the  lirking  process  as  a  di¬ 
rected  graph  v'ith  the  line  segments  as  the  vertices  and  the 
Unks  as  the  arcs.  In  general,  a  line  wiil  be  linked  with 
many  other  lines.  The  merging  process  examines  paths  in 
this  graph  and  tests  t^m  for  straightness.  The  Unking  pro¬ 
cess  effectively  reduces  the  number  of  combinations  of  lines 
which  must  be  examined  by  the  merging  process. 

Merging 

The  merging  process  consists  cf  grouping  and  replace¬ 
ment,  and  incorporates  the  geometric  context.  The  amount 
of  context  used  it  the  search  radius,  which  bounds  th?  length 
of  a  sequence  of  lines  which  is  grouped  and  tested  for  straight¬ 
ness.  This  sequence  of  lines  is  approximated  by  a  straight 
line,  and  if  the  approximation  is  good,  then  a  subsequence 
is  replaced  by  a  straight  line  at  a  larger  scale.  The  algo- 


rithm  proceed*  by  examining  each  line  in  the  link  graoh 
and  performing  the  following  steps: 

1.  If  the  line  ha*  already  been  merged  with  oiher*,  skip  it. 

2.  Generate  ail  path*  of  line*  'intrvt  to  the  initial  point  or 
the  final  point,  within  the  Merck  raAms. 

3.  Generate  all  path*  which  are  combination*  of  paths  from 
step  2:  a  path  to  the  initial  point,  the  line  itself,  and  a 
path  from  the  final  point. 

4.  Fit  a  straight  line  to  the  set  of  endpoint*,  and  measure 
the  closeness  of  fit  with  the  metric: 

■£*-*<  Co** 

where  A  is  the  distance  from  the  endpoint  to  the  ap¬ 
proximating  line,  it  is  the  number  of  endpoints,  w;  are  rel¬ 
ative  weights  based  on  length,  and  *  is  the  distance  from 
the  initial  point  of  the  first  line  to  the  final  point  of  the 
last.  In  addition,  there  is  a  curvature  measure,  which  is 
the  reciprocj  of  the  radius  of  th*  circle  which  is  tb*  best 
fit  to  the  set  of  endpoint*.  This  curvature  is  weighted  by 
(be  sum  of  the  lengths  of  the  lines  because  the  curvature 
is  more  significant  for  long  line*  than  for  short  ones.  The 
curvature  is  used  to  filter  out  curved  iine*. 

5  Choose  the  path  with  the  minimum  error  a*  measured 
above.  Using  the  replacement  radius,  a  subsequence  of 
the  path  is  replaced  by  a  straight  ..ue  whose  gradient 
magnitude  is  the  weighted  average  of  those  of  it*  part*. 
Other  path*  through  the  original  line  are  abo  replaced, 
but  we  require  that  there  be  at  least  a  40  degree  differ¬ 
ence  in  Mope  from  one  which  ha*  already  been  added. 
A  single  line  m*y  be  copied  to  '.be  next  larger  scale  if ' 
is  long  enough. 

The  hierarchical  representation 

The  hierarchy  consists  of  mut.iple  pUaes,  each  at  a  dif¬ 
ferent  scale.  A  plane  is  a  set  of  d.rected  line  segments  to- 


ge.her  with  a  gradient  magnitude.  The  scale  of  a  plane  is 
the  range  of  »*■«  lengths  of  tux*  which  can  be  stored  in  that 
plane.  In  the  current  implementation  these  ranges  overlap. 
Thu*  lines  which  vw  not  straight  at  oa*  seals  can  still  pass 
the  straightness  test  at  th*  next  larger  scale.  Each  plans  is 
divided  up  into  a  grid  corresponding  to  th*  baisaf  mf»n*  in 
order  to  reduce  the  search  computation  during  the  linking 
step.  A ’though  all  of  the  planes  share  the  same  coordinate 
system,  th*  w  is  no  direct  connection  between  the  individual 
lines  and  the  pixels  of  the  original  image.  The  grid  is  de¬ 
pendent  on  seal*  so  that  for  any  given  plane  th*  lumber  of 
lines  in  a  grid  element  should  remain  roughly  th*  same  over 
scale*.  This  is  also  based  on  the  aesumptiou  that  after 
linking  and  merging,  the  number  of  lines  will  be  reduced  by 
a  constant  factor.  The  hierarchy  ha*  4  features: 

1.  It  reduces  th*  search  opace  for  sequences  of  lines  to  b* 
linked  and  merged  by  making  the  oesrcA  rndtn*  small  at 
small  scales. 

2.  It  reflect*  the  observation  that  " closeness*  of  lines  is 
scale  dependent. 

3.  It  allows  for  a  multi  scale  representation  of  a  line  which 
may  be  straight  only  at  large  scales. 

4.  It  is  a  compact  representation. 

3.0  EXPERIMENTAL  RESULTS 

The  algorithm  has  successfully  been  applied  to  many 
image*.  Two  of  these  are  tbowt.  in  Figure  3.  As  describe 
above,  the  first  step  is  the  computation  of  the  sero  cross¬ 
ings  of  the  Laplacian.  Figure  4  shows  the  sero  crossings  for 
the  image  in  Figure  3a.  As  one  can  see,  there  are  several 
places  where  the  contour  doe*  not  follow  the  boundary  of 
the  roof  but  branches  off  into  the  texture  of  the  roof  or  the 
texture  of  the  trees.  Figure  5  shows  the  filtered  output  of 
the  Burns  straight  line  algorithm  (lj  with  lines  of  length 
greater  tha.-  or  equal  to  5.  Bis  algorithm  is  very  successful 
in  locating  many  of  the  straight  lines,  bui  some  of  them  are 
fragmented.  One  can  also  find  instances  where  the  Bums 


algorithm  produce*  multiple  parallel  Use*  in  a  alow  gradi¬ 
ent,  while  the  geometric  grouping  algorithm  hire  doe*  not. 
However,  the  problem  of  mnltipie  parallel  Hoes  could  oc¬ 
cur  and  would  require  the  uae  of  te  o  di mention al  merging 
a*  a  tolution.  Figure  6  ibow*  ‘.he  unfiltered  output  of  the 
geomitnc  grouping  algorithm.  By  filtering  this  output  to 
keeo  only  tne  long  Use*,  one  can  extract  thuee  straight  line* 
which  are  Likely  to  \yt  iignificant,  for  example  a*  boundaries 
cf  object*  in  the  image.  The  reaulta  after  filtering  are  shown 
in  Figure  7.  Figure  8  »how*  the  re*ult*  similarly  obtained 
for  the  image  in  Figure  3b.  Experiment*  have  alao  been 
perform «d  with  aerial  image*,  and  th*  remit*  rr*  very  en¬ 
couraging.  Figure  0  (bow*  the  remit*  for  a  car  with  few 
lor.g.  straight  Une*.  In  the  figure,  the  thackne**  of  th*  line* 
,*  proportional  to  the  contrast. 

So  far  the  only  problem  which  we  have  encountered  if 
overlinking  of  line*.  A*  one  can  *ee  in  the  roof  m  Figure 
6,  edge*  in  the  texture  which  have  gap*  but  are  collinear 
are  tometime*  linked  and  merged.  In  thia  caae  th*  '.ap» 
between  the  line*  are  important.  Here  the  algorithm  may 
be  improved  by  uxing  the  denaity  of  line*  to  inhibit  linking 
when  the  dentity  it  high.  In  our  implementation,  the  JsaJbitf 
raditu  depend*  oa  the  tcale,  but  it  ahould  really  depend  on 
the  density  of  line*.  We  can  *ee  thr*  i«  oar  own  perception. 
If  there  i*  a  high  denaity  of  line*  with  different  direction*,  we 
would  ocly  perceive  a  straight  line  if  the  gap*  between  ‘he 
fragment*  were  very  imall.  Neverthele**,  there  will  be  caae* 
where  amall  gape  are  important,  and  it  will  be  neceaaary  to 
consider  other  geometric  context  than  juat  colliaearity. 

4.0  CONCLUSION  0 

The  resu'ts  shown  in  this  paper  indicate  that  the  use 
of  geometric  grouping  in  extracting  long,  straight  line*  pro¬ 
duces  a  significant  improvement  over  result*  obtained  from 
standard  edge  detection  algorithm*.  The  linking  and  merg¬ 
ing  of  straight  line*  into  longer  straight  line*  is  a  paradigm 
for  the  general  process  of  linking  and  merging  of  geomet¬ 
ric  structure*  into  larger  and  more  abstract  stru  lures.  It 
is  clear  from  the  way  in  which  this  work*  for  straight  Lire* 


that  it  is  naturally  »  hierarchical  prnc<«*.  Aa  object  may 
have  different  representation*  at  different  sen**  or  level*  of 
abstraction  Ia  addition,  th*  eSciency  of  the  computation 
i*  dependent  oa  hierarchical  procuring.  Lastly,  the  geomet¬ 
ric  context  to  b«  used  in  compulation  i*  alao  a  function  of 
the  level  in  th*  hierarchy. 

I.  BurM^J.B-.Hanvot.A.,  Rimmaa.g.,  ’Extracting  Straight 
Lino**,  7th  icL  r oaf.  o a  Pattern  fatopitkn,  pp  481- 
44S,  Montreal,  1084. 

I.  Canny,  J.F.,  ’Finding  Edge*  and  Lin**  ia  Imag***,  A.L 
Lab  MIT  Tech.  Report  AI-TR-710,  1083. 

3.  Davis,  L.,  ’ A  Survey  of  Edge  Detection  Technique*’, 
Computer  Graphic*  and  Image  Processing,  v.4,pp  148- 
770,  107S. 

4.  H *raLw k,  R.M.,  ’Digital  Step  Edge*  from  Zero  Cram- 
Lbgx  of  Second  Directional  Derivative*’ ,  IEEE  Tran*. 
Pattern  Anal  Mach.  Intel!,  v.6,  pp  S44I,  1»84. 

$.  W'tkiu,  A.  'Scale-Space  Filtering1  .Proceeding*  of  U- 
CAI,  pp  1010-1011,  Karlsruhe,  1083. 

8.  Yuille,  A.L.,  ard  Poggio,  T.,  ’Scaling  Theorem*  for 
Zero-cro**ings*,  MIT  A.l.  Memo  730,  1083. 
Acknowledgement* 

This  presentation  i*  baaed  on  th*  work  of  Michael  Boldt, 
who  is  currently  a  graduate  student  at  the  University  of 
Ma**achu*ett*.  We  would  also  like  tc  thank  Brian  Bum* 
for  hi*  helpful  idea*.  Funding  for  this  research  was  provided 
by  DARPA  grant  N00014-81-K-04o4. 


n^r*  X  Two  d«Jti**d  image* 
to  dmnitnl*  algorithm 


Figure  4.  Zero  crowing*  of  the  Laplacian 
for  a  part  of  Figure  3a 


Figure  5.  Fiii*~d  output  from  the  Burn*  algorithm 
length  >  S 


On  Detecting  Edge* 


k  liuvjit  S  Nalw* 

Thom**  O  Bin  ford 

A  I  Lab  .  Stanford  Injvzrstty,  C.t  DfSOS  ^ 

* 


Abstract 

An  edge  I*  <r  image  corresponds  to  a  discon¬ 
tinuity  tn  the  intensity  surface  of  Ike  nndertyiny  teen*. 
!t  can  b.  approximated  by  a  piecewise  ttraiyki  caret 
composed  of  edgels.  i.e  short,  linear  edge-elements, 
each  characterized  by  a  direetion  and  a  position.  The 
approach  to  edyel-deteetion  here,  it  to  fit  a  sene j  of 
one-dimensional  surfaces  to  eaeh  random  (kernel  of  the 
operator )  and  accept  the  snrfac.  description  whseh  is 
adequate  in  the  least  squares  sense  and  has  the  fewest 
parameters  (A  oDr-dimeostonaJ  surface  is  one  mkieh 
ii  constant  atony  some  direction.)  The  tank  is  •*  ade¬ 
quate  basis  for  the  step-edge  and  its  combinations  are 
adequate  for  the  roof-edge  end  the  line-edge. 

The  proposed  method  is  robust  milk  respect  to 
noise;  for  (step-size  /  )  >  2.5,  it  kas  subpixel 

position  localization  ^Il(,  <  1/3)  and  an  angular 
localization  better  than  10*,’  further,  it  is  insensitive  to 
gradients  These  results  are  demonstrated  noth 
analysis,  statistical  data  and  edgel-imagcs  Also 
included  is  a  comparison,  of  performance  on  a  real 
image,  with  typical  operator  (l)iffcreeee-of- 
Caussinns).  Tho  rcsur.  indicate  that  the  proposed 
operator  is  superior  with  respect  to  detection,  localiza¬ 
tion  and  resolution. 


I.  Introduction 

An  edge  in  an  image  correspond*  to  an  intensity 
discontinuity  in  the  scene.  Although  it  may 
correspond  to  an  edge  of  an  object  ip  space,  it  need 
not.  It  might  well  be  the  image  of  a  shadow  (illumi¬ 
nation  discontinuity)  or  a  surface  mark  (reflectance 
discontinuity). 


This  is  ao  updated  version  ot  3  paper  with  Ibe  time  tille  [t'J| 
which  wan  published  id  ibe  Proceedings  of  tbe  Image  t'nder- 
standiug  Work -hop  held  al  New  Orleass  on  October  A-l.  I VI4 
It  has  been  accepted  for  publication  in  ibe  Ihhh  transactions  on 
Pattern  Analysis  and  Machine  Intelligence 

This  work  wa*  supported  in  part  by  the  Defense  Advanced 
Itcscarch  Projects  Agency  under  contract  NtrtgMts-M l-f'-O 1 . 
During  its  early  phase.  V  S  N  was  suppoited  by  the  Information 
Systems  Laboratory  at  Stanford 


It  is  hard  lo  over-cmphasize  the  importance  of 
edgn-u  elect  ion  tn  image  understanding,  k'oet  module* 
in  a  conceivable  vision  tyntem  depend,  directly  or 
indirectly,  on  the  performance  of  the  edge-detector 
Consequently,  there  has  been  a  substantial  effort  in 
this  direction  Despite  this  effort,  many  in  the  com¬ 
munity  believe  that  the  problem  u  largely  unsolved. 
In  fact,  it  may  be  claimed  with  some  justification, 
that  research  and  motivation  on  other  fronts  (e  g. 
stereo  and  line-drawing  interpretation)  ha*  been  dam¬ 
pened  by  the  meffretivene**  of  existing  detectors. 

Blicher  [4]  provides  an  insightful  review  of  previ¬ 
ous  w^rk  on  finding  edges  in  image  data  [see  also  7, 
1].  Much  of  this  work  has  been  fcaseq  on  discrete 
approximation  to  differential  operators  [see  5|. 
Although  edges  do  contain  large  first  derivatives  and 
zero-cross ings  of  tbe  second,  the  mapping  is  neither 
one-one,  nor  onto.  It  is  well  known  tbg}  derivatives 
emphasize  high-frequency  noise.  In  fact,  the  higher 
the  order  of  the  derivative,  the  mo’e  pronounced  the 
effect  (taking  the  a'1  derivative  o>  a  function  is 
equivalent  to  multiplying  its  Fourier  Transform  by 
/  *  ).  Further,  operaVa  that  threshold  on  the  first 
derivative  respond  to  smooth  shading.  For  example, 
the  Nevatia-Babu  Operator  [I3|  and  Canny's  Operator 
[lij  return  faise  edges  on  smoothly  shaded  surtaces. 
Lateral  inhibition  has  been  proposed  as  a  solution  by 
Marr-iiildrcth  [Itj  and  Binford  [3],  However,  this 
may  involve  inking  3ri  order  derivatives. 

The  noise-characteristics  of  an  operator  depend 
on  its  size.  The  larger  the  operator,  the  more  it  aver¬ 
ages  out  random  noise.  However,  it  is  also  more  likely 
to  overlap  several  edges  or  corners  simultaneously  and 
thes  degrade  the  r«“>olution  capability.  The  detecta¬ 
bility  and  localization  of  high-'-urvature  edges  also 
suIT'tv  Further,  as  the  operator  size  is  increased,  the 
assumptions  invoked  in  its  design  ma;  breakdown, 
introducing  large  and  unknown  biases.  When's*  the 
noise  sensitivity  of  an  operator  depends  on  its  size,  the 
associated  resolution  capability  depends  on  the  sup¬ 
port  used  to  make  decisions.  This  support  is  generally 
larger  than  the  operat m-size.  For  example,  one  may 
use  a  (3  x  3)  window  to  estimate  tbe  gradient  at  a 
point  and  then  base  the  decision  on  a  local  gradient 
maximum  whose  d“tection  requires  considering  at 
least  three  adjacent  estimates.  Tbe  lateral  decision- 
support  which  determines  resolution  in  this  ease,  is  5 


.s 


r— i 


1  -i \ •  1'  ni't  not  .1  Direct .•>!  _d  meraiorv  Kk**  itrnw  of 
N>  .  |I  la-H  1*"1  13;  .inj  Ibohri  .31.  iti.roduce  implicit 
un-mcing  whch  iv  Urn  -i.  along  the  «-*! rather  than 
i.  e  -s  ii  1^. ir'  pic  operators.  It*  •  Marr-ltiMretb  lllj 
and  Mianinugani  Dirkey-tin-en  'I7j  un  toe  other 
hind  "IT' r  -iinpli' i  >  and  ■iiiif'.rmily  ».  the  expense  of 
Ml!'-  1  time  uro"  edges  dans  au  spur  I  hing  he* 

••nipl' •>••»!  by  Marr  Hildreth  ;li]  and  (  »ny  :6|  to 
•  til  l-  [!•  i'<  Tills  car  be  dcror.'.poaed  into  iwu 
■I Ur  f  ti.i!  I  l<  gvus'tan  smoothing  operations  one 

i|.  ni:  till  i"lc«*  and  the  other  across  it  Let  it  a  con¬ 
sider  til*-  component  along  the  ••'up  It  t»  our  <  Uini 
tl'.it  for  .1  Eiicn  supjMirt  along  a  ce-ally  straight  edge, 
£.iii'si:in  'ini » it  h  inn  is  levs  effective  1  ban  s-n.ple 
averaging  IV'  atgutnent  runs  ai  folhiws.  (■tire  N 
equal  iiiliii'iii--'.  each  nub  identically,  independently 
disirilni>e<!  additiie  Rau.ss.au  noise  :  I  bn  standard 
licii.ition  of  the  weighted  average  of  thn  intros.'  is 
itiiiiiiiii/ed  n hen  tie  weights  am  all  identical  1 1 he 
standard  deviation  m  this  case  is  reduced  by  a  factor 
of  v  A  |  This  arEument  can  he  equivalently  carried 
our  to  the  f  .liner  Domain  i.itilis  phenomenon. 
althoiiEh  pri-.ent,  is  not  of  anv  significance  along  the 
ed Re  (it  ru io|it .  however,  play  a  role  at  terminations) 

Surface-lilting  is  among  the  other  methods  used 
to  deteet  "dRe-s  It  has  been  employed,  both  ar  a 
means  to  "sinuate  derivatives,  as  by  I’rc'vut  [I8|  and 
'l.ar.ali'h  [x|,  and  as  a  classification  technique.  as  by 
llueelo  l  ;<||  The  ehtef  problem  has  been  the  choice  of 
an  adequate  basis  i  e.  a  oasis  which  can  accurately 
represent  tile  feature  soURht  to  be  detected.  Further, 
these  attempts  have  largely  faih-d  t>  exploit  direc- 
tionaliiy. 

Most  of  the  previous  work  has  ignored  the  ‘blur- 
ruiR "  effect  of  the  imaRinR  system ,  which  can  be 
modeled  approximately  by  Raussti.n  convolution  [see 
•Jj,  HltirriiiR  avoids  tindersampling,  and  .i.us.  as  will 
he  shown  later,  facilitates  sut. -pixel  localization  of  .he 
edge 

Few  |e  R.  t)|  have  t real <x|  edges  a;  composed  of 
i'dgels,  1  e  linear  e<1R<we|ements,  father  than  edge- 
pixels.  This  direcllonal  information,  we  claim,  is  not 
only  an  ivsential  divrriptor  of  an  edge  at  any  point 
along  its  h  igth.  but  is  also  valuable  for  finking  [see 

lx). 

Time  and  again,  claims  '.o  "optimality”  have 
been  made.  Among  others,  the  claimants  include 
llueckel  [11],  ShannitiRim-Di'.'key-dreeD  [17|  and 
Canny  [(»).  However,  often  the  analysis  is  in  the  con¬ 
tinuous  domain,  the  assumptions  and  criteria  ques¬ 
tional  le.  and  the  extensions  to  J-I)  ad  hoc.  An 
"opiimar’  solution  is  only  as  good  as  the  optimality 
cotidit loti  used. 

In  this  paper,  a  variant  of  the  surface-fitting 
approach  is  used;  however,  there  are  significant 
difference  from  most  previous  approaches.  I)  An 
oriented  onr-ditnrniional  surface,  j  e.  a  surface  con¬ 
strained  to  be  constant  along  some  direction,  is  used 
This  r'-ults  in  effective  noise-red  net  ion  without  blur¬ 
ring  . . dges  as  severely  as  in  circularly  symmetric 

smoothing  operators  l’|  We  do  not  seek  to  mark  pix¬ 
els  as  belonging  to  an  edge,  but  to  detect  edg'd*.  le 


short  linear  edge-elements,  each  .  ,iar»  leriZ'-d  by  a 
direction  and  i»iMtion  -d|  The  l.lurriin  unction  of 
the  imaging  system,  which  i»  approximately  g  .ussian. 
is  laken  Milo  account  This  resuits  in  >ub-pixci  i.x-an- 
zstion  of  the  edge  Sub-pixel  localizatsn  could  also 
he  achieved  by  deconvolution  followed  bv  the  localiza¬ 
tion  "f  discontinuities  Dee. involution,  ho*es;.-  .,  in 
ill-,  "lldllloneil  proillem  si*e  'Jj  4)  An  adequate  basis 
b  vs  been  found  not  .nly  for  most  step-edges,  but  also 
for  roof-edges  and  line-edges  These  are  various  cum- 
b. nations  >f  the  tanh  function  with  a  constant  a) 
Hinford  Mtggi-sted  [.!)  that  it  is  di-sirahle  to  do  away 
with  1  h ri-sln dds  altogether  Any  iiiethod  which  -eiects 
a  subset  of  candidate  is.' gels  has  implicit  tbrisholds  (in 
cur  case  ;X|s  corresponds  to  the  selection  of  the  best* 
6'ting  sutface)  However,  the  choice  of  an  explicit 
threshold  does  not  play  a  pivotal  role  to  our  scheme  aa 
in  rri'C  t  others  This  wall  be  illustrated  to  Section  Ml. 
I‘  fact,  it  may  be  desirable  to  postpone  explicit  thres- 
n.dding  10  lee  linking  stage 

We  begin,  in  section  II.  by  giving  a  definition  of 
an  edge  m  t  rue  of  the  intensity  profile  of  the  viewed 
scene  Then  in  section  III.  som»  of  the  problems  xssr 
cut.  d  wall  edg*-deiect ion  based  on  zeo -crossings  ..f 
the  s.-cond  derivative  are  discussed  Much  of  the 
work  to  date  has  used  a  v  ariant  of  thus  criterion  Sec¬ 
tion  IV  contains  the  details  of  our  approach  and  Sec¬ 
tion  V  outlines  the  .algorithm  step-by-step  as  it  has 
been  implemr jled  for  >tep-edgc|  detection  This  is 
followed  in  Section  M  by  a  detailed  example. 

The  proposed  approach  to  edgel-detection  is 
robust  with  respect  to  noise  For 
{tltp  tore  /  <T,„„  )  >  it  has  sub-pixel  jxisition 

localization  <  1/3)  and  an  angular  localiza¬ 

tion  better  than  10*  Statistical  supporting  evidence 
in  Section  VII  is  accompanied  by  some  simple  analysts 
in  Appendix  ill  Further,  our  operator  is  insensitive 
to  high  intensity  gn.lionts  which  do  not  correspond  to 
edges  This;-  clam  -  are  substantiated  in  section  Mil, 
with  pictures  of  Otl'.'  Is  est.inatrd  from  several  images. 
This  section  also  in  oi.les  t  comparison  between  the 
performance  of  an  implcm  ntation  of  the  Marr- 
llihlrith  Operator  |li.  ind  our  operator.  The  pictures 
presented,  indicate  lit.-.:  our  operator  is  superior  with 
r.-sp.  ct  to  detection,  h  ilizalion  and  resolution.  We 
conclude  section  I  A 

It  slum  lie  point'd  out  that  the  problems  of 
multiple  scale  and  of  miking  edgels  into  extended 
edges  are  not  considered  here.  Ifeli  able  edge!- 
detection  should  be  expected  to  make  these  problems 
more  mating  a  I  le.  Results  on  linked  edgi-s  will  be 
for'hco'iung  ■  n  a  sequel. 


II.  Definition  of  an  Edge 

Any  extended  edge  in  an  image  can  be  approxi¬ 
mated  by  short  linear  segments  called  edgels.  each 
cli.'ir-n  teri/cd  by  a  position  and  an  angle.  Kdg<  Is 
correspond  to  final  discontinuities  of  various  order  n 
tin'  intensity  surface  of  a  sene  A  discontinuity  if 
the  n  '*  order  i'  "tie  whose  n,l>  derivative  contains  a 
ili-li a  function.  Hence,  a  line-edge  is  a  O'*  order 


Least-Squares  Cubic-Fit 


Szep-Ldges 


Roof-Kdge 


Line- bulges 


Fig  1.  Kxamples  of  edge-p.nfiles,  at  they  ap¬ 
pear  before  being  '  blurred''  by  tte  unaging 

s)  stem 

discontinuity,  a  step-edge  is  a  I**  order  discontinuity 
and  a  roof-edge  is  one  of  2*^  order.  Some  examples 
are  shown  in  Fig.  I.  Howtver,  the  images  we  obtain 
in  practice  are  degraded  by  optical  and  other  aberra¬ 
tions  These  effects  can  be  approximated  by  convolu¬ 
tion  with  a  gaussian  (see  2|  of  a  certain  standard  devi¬ 
ation.  Some  of  this  "blurring"  is  desirable,  even 
though  it  limits  the  resolution,  because  it  also 
bandlimits  the  signal  befoie  it  is  sampled.  Its  absence 
would  result  in  severe  aliasing.  A  manifestation  cf 
aliasing  in  a  picture  would  be  the  "staircase"  appear¬ 
ance  of  edges  which  are  neither  horizontal  nor  verti¬ 
cal.  (Beware  of  mistak.ng  scanner-line  jitter  [see  2| 
f<  r  alia.ing11  As  a  consequence  of  "blurring",  there 
are  no  intensity-discontinuities  in  ttie  image.  The 
importance  of  this  will  be  illustrated  in  the  following 
sections. 


III.  Zero-Croaainga  of  tho  Second  Derivative 

Much  of  the  wora  to  dite  has  used  rero-cros.sings 
of  the  second  derivative  to  detect  and/or  localize 
step-edges  (11,  8,  8  etc.l.  There  are  some  problems 
associated  with  this.  As  indicated  in  the  introduction, 
derivatives  amplify  high-frequency  r.oise.  Further,  if 
surface-fitting  is  used  to  estimate  derivatives  and  the 
basis  is  inadequate,  then  the  zero-crossing  can  is-sult 
in  extremely  bad  localization,  e  g.  consider  the  rase  of 
a  cubic-fit  for  a  step-edge  rross-rretion  which  is 
located  near  the  boundary  of  an  image-  window  (see 

Fig  '-’I. 

It  is  not  hard  to  see  that  we  can  have  zero- 
crossings  in  the  absence  of  an  edge,  e  g.  at  the  base  of 
a  ramp  |,1).  Zero-crossings  of  the  second-derivative  are 
essentially  points  of  inflection  and  thi*se  need  not 
correspond  to  edge-s,  as  in  the  rase  of  a  corrugated 
intensity  surface.  It  is  our  claim  that  zero-crossing 
operators  do  not  adequately  exploit  the  local 
mtensity-profil  r'-tep  edges. 


a/ 


Fig  3.  Inadequacy  of  the  cubic-fit  for  a  step- 
edge  cross-section  which  it  positioned  near  the 
edge  of  a  window. 


The  intensity  surface  oa  the  two  side*  of  a  sleev¬ 
ed  ge  will  in  general  be  sloped,  as  indicated  in  Fig.  1. 
We  will  henceforth  refer  to  such  an  edge  a*  a  general¬ 
ized  step-edge.  In  root.'aat,  an  ideal  step-edge  it  con¬ 
stant  on  both  sides  and  is  a  subset  of  the  former. 
Note  that  whenever  we  refer  to  an  idea]  or  generalized 
step-edge  in  an  image,  the  imaging-system  "blur"  will 
be  implicit.  A  simple  analysis  (Appendix  i)  of  a  gen¬ 
eralized  step  convolved  with  a  gaussian  shows  that,  in 
the  continuous  ease,  the  localization  based  on  zero- 
crossings  would  be  biased  by  /  step-size), 

where  A,/.,,  is  the  difference  between  tb_-  slopes  on 
the  two  sides  of  the  step  and  <rM„  is  the  standard- 
deviation  of  the  effective  blurring  gaussian  mentioned 
in  the  previous  se.-tion.  On  more  than  one  occasion, 
authors  have  suggested  gaussian  prtccti  volution  as  a 
method  of  noise  reduction  (!«,  8].  It  can  be  shown 
that  this  would  effr^-ively  amount  to  raving  a  blur¬ 
ring  function  with  a  variance  equal  to  the  sum  of  the 
two  variances  and  hence,  it  would  further  degrade 
localization  of  generalized  step-edges. 


IV.  The  Details 

A  variant  of  the  surface-fitting  approach  is  used 
here.  However,  unlike  previous  work,  our  basis  is  con¬ 
st.  ained  to  lie  directional  and  is  non-linear  in  its 
parameters.  I!  is  important  for  the  reader  to  distin¬ 
guish  between  a  non-linear  basis  and  a  basis  which  -s 
non-linear  in  its  paraoi  »te-s:  to  illustra'.e, 
(<i0  +  a,j  +  a2i2  +  Sxz’l  is  i.n-'ar  in  its  parame¬ 
ters  while  (6  +  6 '/  |  is  not.  Whereas  any  least- 
squares  surface-filth. s  method  whose  basis  is  linear  in 


4  rj2 


VA 

Va 

,a\\ 

t'-Vx 

V.x 

*.A 


Slep-K.djje 


l>recr*m  or  V ^nation 


Fl«  3.  One  DimttuioHti  Surtax* 

its  parameters  >in  equivalently  b*  formulated  as  a 
cnnvolut imi.  surface-biting  with  a  basts  which  is  oon- 
|ilie~r  in  its  parameters  rannot  be  ’bus  formulated 

We  take  into  account  (be  fart  that  the  image 
consists  of  samples  of  the  true  intensity  profile  hluried 
by  the  imaging  system.  The  standard  deviation  oi 
this  g.’tissian  blurring  function  ran  be  determined  by 
an  examination  of  the  image  of  a  point  or  step-edge. 
As  a  r»-s»i It  of  this  blur,  we  have  an  image  with  no 
underlying  discontinuities  The  spectrum  u  baodlim- 
His)  :  voiding  aliasing  and  making  sub-pixel  loralua- 
t  ion  [Missible. 

The  noise  is  generally  assumed  to  be  additive 
white  gaussian  If  one  could  find  the  direction  of  me 
edge  in  a  reliable  fashion,  then  the  noise  could  be 
rislucisl  by  averaging  data  '0  a  direction  parallel  to 
the  edge  This,  of  course,  relies  on  the  fact  that  the 
window,  le  the  kernel  of  the  operator,  is  small 
enough  for  .he  idge-segment  in  it  to  be  modeled  a*  an 
edge!  He  achieve  the  above  mentioned  smoothing  by 
lilting  to  each  window  a  . ne-dimeniional  oriented  sur¬ 
face.  i  e  a  surface  which  is  constant  ID  one  direction, 
as  shown  in  Fig  3  (the  direction  of  invariance  would 
be  parallel  to  the  edgel)  Fitting  this  l-l )  surface  ls 
equiv  ilen*  to  treating  the  data  is  strictly  one  dimen¬ 
sion.  ;|  to  projisling  it  along  the  direction  of  invari¬ 
ance  onto  a  (ilane 

Now  we  coni"  to  tiie  question  of  a  reliable 
direction-finder  for  window-  iy polhesircd  to  contain 
edgels.  A  lirst-ai  prot'matmn  for  the  direction  of  vari¬ 
ation  can  be  obtained  from  the  gradient  of  a  lexst- 
-cpiar.-error  planar-fii  to  the  window.  However,  this 
leads  to  a  substantial  systematic  bias  for  rectangular 
windows  |IJ|,  which  is  whal  we  have  used.  A  more 
general  surface  con  he  used  to  refine  this  first  estimate 
and  reduce  the  biasing  error.  We  fit  a  least-square* 
,mr-dimenni:nr.l  cubic  surface  to  the  nearest  a  To 
clarify,  a  /-/>  cubic  surface  is  constant  t.  one  direction 
and  is  described  by  a  cubic  polynomial  in  the 
orthogonal  direction.  Starting  with  the  initial  esli- 
mate  of  the  direction,  which  is  obtained  from  the 
planar-tit.  the  search  for  the  orientation  of  the  cubic- 
fit  is  generally  not  more  than  a  few  steps.  It  should 
also  be  pointed  out  that  for  a  window  with  an  edge!. 
•  ho  plot  of  the  square-error  vs  angle  for  a  !-l)  cubic- 


/ 

/ 


lanZi  unh 


unh  unh 

Windo«r-*t  »-  Window--* 


Fig  4.  Adeqta le  ba.se*  for  edge-profile*  in  the 

image  are  combination*  of  tbr  tanb  function 

with  a  constant. 

fit  is  howl-shaped  and  centered  around  the  true  angle. 
Hence,  once  witbic  the  bowl,  standard  technique*  like 
Newton  s  Method  rao  lie  used  to  find  the  minimum. 
Appends  II  contain*  all  the  relevant  equationa  for  the 
various  lexst-square-fiU  performed. 

It  should  be  emphasized  tbit  there  cannot  be  any 
one  unique  bast*  which  i*  appropriate  to  describe  the 
image  dat  •  in  all  window*.  If  we  attempt  to  do  this, 
we  will  obtain  ireorrert  result*  when  the  bxsfi-  i» 
inadispiate  and  noisiescnsiUve  results  if  the  bast*  is 
not  minimal  I’erhips.  a  simple  .llustratioi  of  this 
important  observation  is  called  foi  Consider  a 
by  pot heiu  s|  situation  where  we  are  gr  en  vuue  noisy 
•  ample*  of  "y."  which  is  a  polynomial  in  a."  and  are 
told  in  deierminr  a  desiription  of  the  underlying 
curve  For  the  sake  of  argument,  assume  that  "y  ’  is 
a  quadratic  function  of  "x."  Obviously,  fitting  a 
straight  line  jj  =.,+  «,/)  to  the  data  is  :.ot 
g  ung  to  give  us  an  adequate  description.  More 
importantly,  fitting  a  polynomial  in  "i"  which  is  of 
higher  older  than  a  quadratic,  is  going  to  have  non¬ 
zero  coefficient*  for  x‘  ,  i  >  2,  owing  to  the  noisy 
nature  of  the  data  ll.-nce.  even  though  polynomials 
of  order  greater  than  ‘2,  give  a  smaller  least -squares* 
•rror  than  a  polynomial  of  order  2,  it  is  desirable  to  fit 
a  miadratic  7  he  reader  will  probably  recognize  this 
to  be  a  restatement  of  the  underlying  principle  of 
linear  regression  analysis  in  statistics.  Considerations 
similar  to  the  ones  just  detailed  have  been  investi¬ 
gated  lor  one  dimensional  steps  by  l.eclrrc  and  Zurkcr 
|I0] 

Now,  consider  the  choice  of  an  adequate  basis, 
cor  most  step-edgi*  the  tnnh  function  with  a  con- 
s'-int.  i  e  *.tanh(/[:  +p  |)  +  k  where  *  ,p  and  k  are 
the  parameters  and  /  is  a  constant  deternr.ied  by 
the  "blur "  of  the  imaging-system  will  be  adequate. 


Vs  can  be  seen  from  Kin  5.  :be  tntxunum  error  in 
.vproxunating  aa  eie-d  step-edge  by  the  txnb  i*  less 
than  17  of  tii*  step-size.  One  important  by-product 
of  employing  the  tanh  is  a  reliable  estimate  of  the 
contrast  of  t.ie  edge.  From  our  case  studies  it  seems 
that  the  (ontrast  i>  helpful  not  only  in  linking,  but 
also  in  interpretation.  For  roof-edges  and  line-edfs, 
combinations  of  the  tanh  function,  ai  depicted  in  F.g 
I,  seem  to  be  adequat”  bases. 

Some  authors  have  tried  to  detect  ed'rs  which 
hate  large  deviations  from  an  ideal  step-edge  by  using 
multiple  scales.  Multiple  scales,  we  believe,  are 
unnecessary  and  undesirable  in  this  'ase.  For  such 
i-dges.  thi  tanh  basis  is  inadequate  and  a  cubic  or  a 
lanh  with  a  cubic  might  be  adequate.  The  latter  haa 
some  problems  be*ause  the  tanh  and  eubie  arc  not 
completely  independent.  It  should  also  be  noted  that 
the  cubic  is  inadequate  for  most  step  edges  and  that 
derivative  estimati-v  bared  on  a  cubic-fit  can  be  quite 
unreliable  due  to  the  cifglrt  which  are  characteristic 
of  i~il>  non, ids  ll  may  b  dr-'rable  to  employ  splines 
wh-n  the  iinh  and  the  culue  are  inadequate  bases 
He  have  used  a  cubic,  with  a  check  for  consistency  in 
positi-m  .-siimale  with  the  taiih-fi;.  in  one  version  our 
detector.  Our  window  is  too  small  (5  a  S|  for  finding 
the  parameters  of  a  anh  with  a  cubic  or  of  spllnei,  in 
the  rase  of  horizontal  and  vertical  edgts.  The  posi- 
tion  esliinate  based  «»c  the  zercerrissing  of  the  second 
derivative  for  a  cubic-fit  is  biased  for  reasons  similar 
to  those  listed  in  Appendix  I.  Hence,  for  large  values 
of  ,  refinement  of  the  initial  estimate  may  be 

desirable.  If  one  uses  a  general  basts  like  the  cubic,  it 
is  also  desirable  to  confirm  that  a  dominant  com¬ 
ponent  of  the  cubic-fit  is  indeed  a  step-edge.  In  our 
implementation  we  accomplish  this  by  basing  our  esti¬ 
mate  of  (he  stepesize  on  a  tanli-fit  even  if  the  basis 
used  for  detection  is  the  cubic.  We  do  not  rorsider 
our  handling  of  non-ideal  steps  to  be  complete'y  satis¬ 
factory 

We  compare  the  least-square-error  of  a 
quadratic-lit  wi.h  that  of  a  tauh-Ot  and  choose  the 
one  with  the  smaller  error  to  determine  the  existence 
or  absence  of  a  stop-edgel  .  This  discriminates 
against  snv«>th  shading  and  reduces  the  significance  of 
subsequent  thresholding.  In  the  initial  stages,  we  bad 
used  the  \ --Statistic  to  determine  the  adequacy  of 
the  basis,  ft  was  found  that  this  was  unnecessary  and 
perhaps  undesirable  because  of  inadequate  modeling  of 
the  error.  A  procedure  similar  to  the  one  just 
described  ran  be  used  to  detect  steps  with  large 
deviations  from  an  ideal  step-edge.  For  example,  if  a 
cubic  basis  is  being  used,  then  the  1,  'JO  F-Statisl ic 
corresponding  to  the  quadratic  and  cubic  fits  should 
be  employed  to  verify  (hi  appropriateness  of  the 
cubic-lit. 


Ml  should  lie  noletl  'hat  both  the  fils  have  tne  same  number  of 
unknown  parameters  This  justifies  our  comparison  of  the  two 
leasl-sqnare-errors  to  delermm-  which  basis  describes  the  data 
more  aourilety.  The  formulation  of  ihe  K-Siatislie  correspond¬ 
ing  to  the  (anh  and  quadratic  tit s  is  not  possible,  even  if  one  ig¬ 
nores  the  min  i  meanly ,  because  ihe  bases  are  not  nested 


Frror 


l 


Fig  6.  E  rrof  profile  resulting  from  the  approx¬ 
imation  of  an  ideal  unit  step-edge  by 

[OT,  y-  0  5  tanh(J^-)] 

At  this  juncture,  we  would  like  to  bring  to  the 
reader's  notice  some  of  the  reasons  to  expect  an 
improved  performance  from  the  use  of  a  directional 
tanh-surlace.  First,  our  basis  requ>re»  Ihe 
specification  of  only  four  parameters  which  determine 
the  orientation,  the  position  and  the  upper  and  lower 
intensities  of  the  step-edge,  ll  »  immediately  seen 
that  this  is  the  minimum  number  required  to  describe 
a  sicp-edgel  with  predetermined  ‘blur”.  Contrast  this 
with  eight  required  by  Iteuc  kel*  Method  (0)  and  ten 
by  liaralick  [*|  /Vs  a  result,  we  can  use  smaller  win¬ 
dows  than  n..r>t  previous  equally  sophisticated 
approaches.  This  implies  better  resolution  capabilities 
and  improved  performance  on  high-curvature  edges. 
Second,  the  highlv  constrained  nature  of  our  basts 
(which  l'  borne  out  by  the  presence  of  only  four  unk¬ 
nown  parameters)  should  be  expictcd  to  offer  noise- 
robustness  analogous  to  matched-f  Itering  classification 
wherein  noisy  patterns  are  categorized  based  on  their 
closest  "match "  1  >  noiseless  representatives  of  the 
liferent  classes.  Out  approach  distinguishes  between 
wo  classes  step-edgels  and  no  l-step-edgels;  ,>tep- 
edgels  are  characterized  by  a  s*oi>-roinponcnt  of  vari¬ 
able  intensity,  orientation  and  position.  Non-step- 
edgels  ran  be  belie,-  di-scribi-d  ly  quadratic  surfaces. 
Of  course,  these  assumption  m  ty  break  down  as  the 
window  size  is  increased. 

We  have  carried  out  our  nitial  investigation  for 
step-edges,  which  are  by  far  the  dominant  type. 
Numerically,  it  was  determined  that  for  aft„  =  1 
and  an  ideal  step-edge,  the  optimum  scaling  factor  for 
the  .argument  of  the  tanh  function  was  0.H7  This  fac¬ 
tor  was  determined  by  minimizing  the  square-error, 
I'll  is  is  not  surprising,  as  equating  the  slopes  of  the 
two  functions  at  tile  origi  I  would  give  us  a  value  of 
OX.  Hence,  a  rule  of  thumb  for  the  scaling  factor  is 
(0  H5/(TM>r )  The  normalized  error-profile,  using  ti.ic 
factor,  is  shown  in  Fig.  5  The  detection  scheme  is 


*■ 


NtrpTdje 


'Miiiudh' 'shading 


Fig  6.  A ming'iity  in  pri  file,  given  3  symmetric 

>.i  tii  pies  of  1  -t.  p-"dge  cross*'  sect  inn. 


n*<t  I-  1  rT n  11I  iris  '•■•n-ilne  to  this  factor  and.  in  fart,  it 
delft'*  reasonably  diffuse  shadows 

I  lie  window  sire  is  determined  by  the  standard 
d-.iitmn  of  the  blurring  giu«iaii  It  is  not  hard  to 
s.-e  ihii  Mi"  in  in  1  n  1  >>  in  window  «  :e.  irri-vn  live  of  the 
Id'll  lias  to  he  larger  than  |.l  X  3)  because  as 

illii'tr  lied  in  Mg  (i  there  is  no  way  to  distinguish  a 
|iori,onf.i|  or  vertical  *>  -p-edge  front  stmai'b  ‘hading 
if  we  lake  three  symmetric  sat.ipli-s  of  Ihe  eagr  We 
1  hose  ( a  i  >|  square  windows  Not  surprisingly,  detec¬ 
tion  of  , .  ro-erossings  of  the  'i*4  derivative  requires  a 
niniiimiiii  lateral  'iipporl  of  j  pixels  i.  the  symmetric 
ea-e  \s  Mu'  window  si/e  is  increased  for  a  fixed  tdur. 
we  tradi-c  It  resolution  for  nrprused  detect  ion  and 
loeatir  ai ion  of  |o.  al's  straight  idg'-v  However  the 

•  let  ft  ion  and  ’oe.iloai.on  of  high-i  urs  at  . . bp-s  wiil 

■  lei . - j. .rat.-  ||..ji|,i.  the  invalidity  of  mir  nnpli.it 
edi:"l-ino.|e|  o'  -  luiion  ri  b  rs  to  ihe  minimm  i  -up- 
i.  .  I  le.j'lired  for  tile  *1  ft  eel  Ion  of  an  I'd  ge|,  |e  an 
eilr.ei  ■,  th'siret;.  illy  resolvable  if  it  ran  he  isj.*ati<; 
wit  hill  alls  window  If  it  is  not  detertr  I.  it  is  be  due 
to  Mi e  inadequacy  of  the  edge-dclcctor  f  or  example, 
if  three  parallel  edgi-  are  snared  at  U-p:ve|  intervals. 

ill'll  W1M1  our  ehoii-e  of  slippotf,  tile  ll.iddl . Ige 

would  not  he  resoh  ibl".  toil  the  other  two  might  be 
He  will  1  mill  out  "vai.iple.  of  these  111  our  first  ease 
study  It  should  be  noted  t  we  have  not  in v t •- 
gated  the  tr.-idisoll-  areoni|i.inying  dilferent  window 

s||  ||‘|'S 


that  the  underlying  int  nsity  surface  is  1-P 

(ill  ifefine  the  estiioate  i  f  ihe  direction  of  variation 
!y  iiiting  a  l-l>  rutur  surface  wuh  M.e  least - 
s  juar'S'-error  criterion.  The  re- lilting  eipia'n  ns 
are  non-linear  in  the  angle  However,  owing  to 
the  f.  tiabie  initial  estimate  the  search  is  typi¬ 
cally  a  coiij.le  of  steps  \Ve  find  the  angle  to  the 
nearest  a* 

(ill  [Ofiimnal)  Calruhtp  the  2.  .0  r  •statistic  for  the 
planar  and  cubtr  fits  obtained  in  (i)  and  |n)  If  it 
is  h-ss  than  the  7 Vi  threshold,  then  ;.i:r  ;hr 
absence  of  as  idsei.  inis  thri-shoidinf  serves  ihe 
purpose  of  reducing  computation  by  -onsidcrmg 
only  'hose  windows  wmrh  exhibit  a  statistically 
sigmticant  rrsliirtion  in  the  le.ast-square-error  by 
employing  a  cubic  basis  rather  than  a  planar  one. 

(ml  f  ind  the  leist-s.juares  !■[>  i.anh  surface  oriented 
in  the  direction  found  in  (11)  The  tanh-fit  is 
localized  to  th"  nearest  0  1  pixel  As  will  tie  seen 
in  'sect,,  n  \  11  for  low  and  tin -derate  S  Ml  -  the 
position  accuracy  is  not  determined  by  ihe  quant¬ 
ization  error  associated  with  the  search  steps 

(iv)  Mnd  thi  lexst-sijuarrs  l-l>  quadratic  surface 
ot  lent  of  m  the  direction  found  in  (ill.  If  the 
least-'  piariserror  in  this  case  is  |tw%  than  that  for 
the  : anil- III.  .'oil  declare  I  lie  absence  of  an  edgel. 

(v  1  The  leasi.squares  tanh-fit  performed  in  (i.i)  deter¬ 
mine  the  inteii'it.i’s  on  the  two  snb->  of  the  step 
and  :t.  |H>s|iioii  in  the  window  I  he  s  m:  and 
difference  of  the  constant  term  in  the  basis  and 
the  coi  ff|.  writ  of  the  i.anh  term  dcierrni  .es  the 
I’  Irli'llliw  liel  thl  position  of  the  slep-  dge  Is 
given  by  the  displacement  of  the  t  :nh  ierm 

(%i)  l  hres|,,.|.|  >n  the  sti'p.size  d»*t ,-riu  1 11  •  1  from  (v  I 
I'o  improve  the  reYil.iliiy  of  the  de:  .lion 
proiess.  it  may  also  be  d<siralile  to  ri-quire  the 
edgel  to  be  localized  wiiliin  some  central  sith- 
window.  e  g  '.’-pixel  X  ff-p’Xi  I 

N  H  If  one  wants  to  detect  ste|>-edgels  which  have 
large  deviations  from  an  ideal  step-edge!  -tips 
similar  to  |m)  and  |iv).  but  with  a  basis  different 
from  the  tanh.  will  have  to  be  added.  Of  course, 
the  appropriate  statistical  formulation  will  also 
hav  e  to  be  llseit. 


V.  Outline  of  Algorithm  for  Step-Edgel  Detection 

The  following  is  an  outline  of  the  procedure  used 
to  doled  the  presence  of  an  edgel  in  an  image- 
window.  Tuts  procedure  is  to  he  repeated  over  the 
whole  Hinge  by  shilling  the  w'ndow  in  1-pixel  steps  in 
I  lie  \  at, it  directions 

(Ml  the  relevant  equations  and  st.vistirs  are 
listed  111  Appendix  II.) 

(1)  I ’i-r Ti >rm  a  least-square*;  planar  lit  to  the  window 
and  Use  the  gradient  of  this  fit  to  estimate  the 
directum  of  variation  in  the  window,  assuming 


VI.  An  Example 

He  now  1  ,i •  i-i-c*  to  illustrate  the  algorithm  out¬ 
lin'd  111  Hie  previous  section  with  an  example.  Con- 
sider  the  1111  .age-window  in  I  ig  7-h  which  is  a  noisy 
ver-.imi  of  that  in  Fig  7-a  Th-  underlying  intensity 
step-edge  shown  in  Fig  7-a  hr.,  grey- levels  til  and  Iff* 
on  its  two  sides  and  zt,(>,  ()  o  The  edge  is  located 

at  1  distance  .,f  n  .‘.lii.i  pixel  from  the  center  of  ihe 
window  and  at  an  ingle  of  .11  J"  to  the  x-a\is  The 
noise  in  Mg  /-b  is  addilive  vxloie  gaiisstan  wj'.h 
irvt„  s  l  ie  detected  edge  |s  located  at  a  dis¬ 
tance  of  0  Kn'i  j>  1  \ e |  from  the  .  inter  of  the  window 


I  >iut-riji.;g  Edge 


Detected  Edge 


128 

1  128 

128 

124 

108 

V 

.28 

i  127 

118  : 

C7 

;  75 

124 

' 

110 

i 

86 

l _ l_ 

69 

t  1,1 

64 

.  1 

99 

i  76 

i 

!  66 

64 

!  «  1 

■  1 

t 

70 

;  M 

i  «  : 

64 

- 1 

I  64  1 

Fig  7-*.  Ex  ample  Original  image-window 
with  step-size  —  6-1 


'  I - ' - T 

i  ;  |  I 


135 

132 

132 

127 

118 

i 

!  121 

i 

1 

-  133 

no 

101 

* 

88 

m 

1 

110 

72 

75 

61 

i 

I  108 

1 

78 

i, 

69 

71 

7. 

63 

53 

61 

66 

Fig  7-b.  Example  :  Noisy  image-window  with 
(Mr^sizr  /  (r<WM  )  =  8. 


and  at  an  angle  of  30*  to  the  x-ax:J.  The  em-r  in 
position  is  0  0681  pixel  and  the  error  in  angle  is  -4 V. 
Ilerall  that  the  position  quantization  error  is  ±  0  05 
pixel  and  that  the  angle  quantization  error  is  ±2  5*. 
As  mentioned  in  the  previous  section,  the  relevant 
equations  are  listed  in  Appendix  II.  The  z-axis  shown 
in  Fig  7-r  is  the  estimated  direction  of  variation  in 
the  window  and  is  orthogonal  to  the  estimated  orien¬ 
tation  of  the  edge. 

|i)  Least-Squares  Planar-Fit 

I  \i .  yj  ~  7 1*2  7  3 Ir  +  16  52jt 

Least  Square!  Error  =  2683 

90  =  tan  1 1  ■  ~  |  =  115*  ( f o  nearest  5*) 

9a  is  thp  direction  of  the  gradient  of  the  planar-fit 
and  is  used  as  a  first  estimate  for  the  direction  of 
variance  in  the  window. 

(ii)  Least-Squares  I  D  Cubic-Fit 

t\i ,  y|  .=  71.71  +  21.83;  +  6  10.- 2  -  2.16;* 
z  —  I.  cos(tf)  +  y  sin(0),  9  —  120*  ( fo  nearest  5') 
Least  Squares  Error  —  1205 

9  a  refined  estimate  of  90,  is  the  final  estimate  of 
the  direction  of  variation  in  the  window  and  is 
orthogonal  to  the  direction  estimate  for  the  edgel, 
if  any. 

(ii) '  { Optional } 

The  2,  20  F-Slatistic  for  the  planar  and  cubic  fits 
i“  10  7  and  it  does  exceed  the  75'  I  threshold 
winch  is  1. 17.  Hence,  ve  continue  with  .he  rest 
of  the  algorithm. 

(iii)  Least-Squares  1  .0  Tanh-Fit  along  9 

I  [z  .  y)  —  95.57  +  32.52!  anh  |  |  ~~  I  ■  ( -  f  )  j 


Underlying  Edge - — 

(  Angle  -  H  4  Jejirev  Uwjncc  from  Ctmrf  -  0  2J6  pixel  C 


(  Angle  10  degrees,  i)i\Linc-;  from  Center  -  0  JS8  pixel ) 


Fig  7-e.  Example  :  Underlying  and  detected 
edges 


p  =  0!)  (lo  nearest  0  1  pixel  f 
z  i  cos(  120"'  +  y.  sin(  120°' 

Least  Squares  -  Error  =  1203 

p  is  the  estimate  of  the  position  of  the  edgel 
along  the  z-axis. 


In  )  I.<  ast-Squares  I  D  Quadratic-Fit  along  t> 

I  \i .  y\  —  77. '0  +  1 5.S6 r  +  14a:* 

:  =-  /  cost  120  1  +  y  s!n(r20e| 

I.ra.-t  .'(/ui  irr.i  Error  =  2615 

The  quadratic-fit  least-squares-error  is  more  than 
tin1  taiih-fit  least-squares-error.  lienee,  an  edgel 
I;  as  I'C'-n  delected. 

(v|  Kdge  Parameters 

The  intinsities  on  the  two  ides  of  the  r,ep  are 
■-tmii'ed  from  |iii|  to  be  fid!  and  128.1  (:e. 
05  ±  .12  ">2|.  The  orientation  of  the  edgel  ta 

determ med  from  |u)  to  be  -10“.  i.e.  orthogonal  to 
0.  t he  direction  i  f  variation.  Its  |tosition  is  deter¬ 
mined  from  |m;  to  be  0.0  pixel  from  the  origin 
along  the  /-axis  or  equivalently  0  1670  pixel  from 
the  e-nler  of  the  window. 


VII.  Statistic*!  Data 

Wo  now  present  some  statistical  results  obtained 
fr  in  ..nr  idgi-detector.  Toe  algorithm  outlined  in 
-ti  ps  |i|  through  |vi).  excluding  (n|  ,  of  Section  V,  was 
implemented  Let  us  begin  by  clarify ing  our  notation 
Sgn  i-to-Noi-e-Hatio  |S.N  H.)  is  defined  as 
(«/<•/>  si:e  where  o..,„  is  the  standard 

d'M.'tmn  of  the  r.  >ise.  The  nous-  is  assumed  to  be 
additive  white  gaussiaii.  A  false-positiv  •  . ccurs  when 
no  edge  is  present  in  the  window  and  O  edge  with 
contra-!  greater  than  the  thri-shold  is  declared.  A 
true  positive  occurs  when  an  edge  is  present  in  the 
window  and  it  i-  identified  as  such,  with  its  contrast 
greater  than  iln-  'hre-hold,  the  error  in  position  (per¬ 
pendicular  distance  from  the  renter  of  the  Window) 
|.  „  ,  hall  0  7  pixels  (half  the  diagonal  of  a  plXel- 
siipporil  and  i lie  error  in  angle  less  than  IV 
I-  the  root-me ati-squarr  of  the  error  in  the  position 
and  (T  n  is  the  root-meatl-square  of  the  error  It)  the 
ingle  Ki'  use  u  to  denote  the  r  m  s.  vr.lues  because 
they  losely  approi  un  ite  the  ’.ill  i  o  1  deviation  of 

I  he  .-rri  r-  Tills  is  a  co.. sequence  '  1 C  lu.us  in  the 

position  and  angle  estimates  b"ing  r  ■  1  at  iv  civ  small. 
'/  lie  i  (i ros|io|i I  is  on  the  edge-contrast  ard  is  always 
staled  in  unit'  ..f  i7A  . 

l  ig.  s  shows  a  plot  of  false  po-itives  vs  threshold 
Windows  of  a/-  (a  x  a)  with  a  constant  intensity  sur¬ 
face  -  (i  |>  and  additive  while  g.n|ss|-n  noise 

were  used  for  this  -iiuuiaii  a.  Ttie  value  of  was 

,  lit  'sell  I.)  Ill  0  t>  be.  ail-e  Mils  was  found  1  be  its  esti- 
male  in  I  lie  real  images  considered  m  tin  next  eitloii. 

II  .  ia't  I.e  mu'll  smaller  than  Oh  a-  then  we  should 

ex | . .  aliasing  and  if  it  -  mm  li  larger.  Ine  edge  >s  at  a 

larger  scale  ami  we  no'il  a  corre-potidii.gly  I  irger  siip- 
port  Notice,  that  even  for  a  zero  thri-holl.  falsi 
[los'lives  are  fleelared  III  only  .11' I  of  (tie  eases  'I  his 
I-  in  contrast  with  gradient  t  hres/i. .Ming  srhemes 
will'  h  would  give  100'.  fai  -e  positives.  I  hat  is 

|.i  ■'  Ills . IT  del .'.  t  il  ill  scheme  requires  a  certain  step- 

1, he  '.  orrel  it ion  '  among  the  samples  I .  r  an  edge  to 
I.e  de.  larcd  l  liis  requirement  stems  from  our  choice 
of  i  lie  t .mil  as  a  ba-is  We  have  !'  I‘  <  2.Vi  for  a 


no  u.5  in  1.5  :.o  is  J.o 

Threshold  ( <3^,  ) 

Fig  8.  Plot  of  'alse  positives  detected  in  win¬ 
dows  with  con  taut  intensity  and  additive 
white  gaussjan  noise,  as  a  function  of  the  thres¬ 
hold 

threshold  of  l\P.<  0.2'7  for  2 and 

FI*  <0  01*7  for  2 

Fig  n  shows  a  plot  of  the  true  positives  vs.  the 
thr.-shold.  Square  |.*»  x  *>|  windows  with  ideal  step- 
edg' s.  <tw>,  —  0  t>  and  additive  while  gaiissian  noise 
were  Used  for  the  simulation.  Knell  step-edge  passed 
through  a  l-pixe!  squa.e  in  the  renter  of  the  window 
and  its  position  and  angle  ware  independently  uni¬ 
formly  dist'bm  d  Constraining  the  i.lge  to  pass 
through  the  ient:al  pixel-support  is  justified  because 
e.v  h  segment  of  an  edg. .  which  is  not  near  the  picture 
border,  will  pass  through  the  renter-pixel  of  one  win¬ 
dow  or  another.  To  reduce  the  rout nbiit ion  of  gray 
scale  quantization  effect  s,  the  edge  contrast  vv  as 
chosen  to  be  (>  t  levels  on  a  scale  of  0  ,2a. c  Notice, 
that  even  for  zero  threshold,  we  do  not  get  KK'1 
detection  for  low  5  N  il  's.  In  contrast,  gradient 
thresholding  -ehe  ncs  would  give  I0!)f 7  true  po-itivis. 
I tn I  then,  t li- y  would  declare  any  .lisiribulion  to  tie 
an  edge!  Thus  I  toy  would  have  ■()()','  false  positive, 
too.  Also  nolle!  t  he  relatively  Hat  profile  of  'll.-  plots 
when  the  S  N  IL  is  les,  than  the  < .  rr  'ponding  thres. 
hold  (the  knee''  of  the  plot  for  ••  p-rtienlar  S.N  II. 
occurs  will'll  the  tlir'-shold  IS  equal  to  I  lie  s(e()-s|Ze|.  If 
we  -ynlhesjzed  images  rather  thin  windows,  we 
should  expect  solllewll.il  lllgller  delicti. II  since  each 
non- border  "dge-s.-gment  m  a  pix  l-support  i- 
"scanned"  27  times  and  as  poinic.l  out  earlier. 

delect  e. |gi  Is  and  not  e<|g.-points 


vv  e 


f 


9 


f 


Fig  9.  Plot  of  true  positives  detected  in  win¬ 
dows  with  synthesized  step-edges  and  additive 
white  gain  sian  noise,  as  a  function  of  the  thres¬ 
hold  for  different  S.N  It  ’s. 


Fig.  10  shows  the  plot  of  it  (f  vs  .'..N  if .  for 
the  true  positives  which  would  he  detected  in  Fi;;.  fl  if 
the  threshold  were  zero  and  th"  constraints  in  the 
position-error  and  angle-error  were  removed.  This 
curve  decays  to  it4,j/,  ~  I  0°  for  large  S  N  R  's  (the 
diagram  is  riit-olT  at  S.N  This  is  about  .’i0'7 

more  than  what  we  should  exp  . t  from  the  quantiza¬ 
tion  error  for  a  nnifcimly  distributed  random  variable 
[loj  It  suggests  that  the  bias  associated  with  cubic-fit 
angle  estimates,  from  the  (.'  x  5)  windows,  is  small  in 
comparison  to  the  ouantizalion  interval,  i.e.  1 

Fig.  II  shows  the  plot  of  vs  S  N  R. 

under  i lie  same  conditions  as  in  Fig.  10.  This  curve 
'!<  cays  to  *5  0.032  for  large  S  N  R  's  (the 

diagram  is  eur-off  at  S.N.R.  —  s)  This  differs  by 
about  l O' 7  from  what  we  expect  from  the  quantiza¬ 
tion  error  for  a  uniformly  distributed  random  variable. 
Tins  suggests  that  the  position  estimate*  from  the  (.7  x 
'»)  windows  have  a  negligible  bias  in  comparison  to  the 
quantization  interval.  In  Appendix  III  we  derive  an 
expression  for  tTr,/1(l„,  for  tin  l-I)  high-S  N  R.  case. 
It  call  be  shown  that  this  is  equivalent  to  a  vertical  or 
horizontal  edge  in  the  present  simulation,  with  the 
effective  S.N.R.  being  \/u  times  the  actual  S.N  R..  It 
turns  out  that  the  values  we  would  expert  for  vertical 
or  horizontal  edges  using  the  expression  derived  in  the 
Appendix  are  are  within  2.VV  the  those  shown  in  Fig, 
II.  This  is  despite  the  fart  that  the  errors  in  the 


tin 


. - 1 - 1 - 1 - 1 - 1 - 1 - 1 - r 

1)0  1.0  10  3.0  40  S.0  6.0  7.0  00 


S.N.R.  (step-size  /  CTZ*,  ) 

Fig  10.  Plot  of  the  standard  deviation  of  the 
angular  »nor  in  the  true  positives  detected 
with  threshold  =■  0,  vs  the  S.N.R.. 


o.ihi  -I - 1 - 1 - r 

III)  10  2.1)  .1.0 


“i - r - 1 - 1 - 1 — 

4.0  5,(1  6.0  7.0  HO 


S.N.R.  (step-size  /  arSo,*) 

Fig  11.  P  lot  of  the  standard  deviation  of  the 
positional  error  in  the  true  positives  detected 
with  threshold  =:  o.  vs  the  S  N  It 


■1 


J 


angle  estimates  propagate  to  introduce  errors  in  the 
position  estimate.  The  asymptotic  value  is  within 
10':.  This  confirms  the  domination  of  the  quantiza¬ 
tion  error  for  high  S.N.R.'s. 

The  reader  may  also  wish  to  enow  the  eCce'  of 
the  inclusion  of  step  (ii)'  on  the  statistics.  Although 
the  shapes  of  the  false- positives  and  t:ue*positivcs 
plots  remain  more  or  less  the  same,  their  sizes  get 
ccaled  The  false- positive:  plot  now  starts  out  at  lF’o 
for  a  zero  threshold  and  decays  to  F  P.<  l'T  for  a 
threshold  of  l..Vt<,tl ;  F.P.<  0/1  for  and 

F,P.<  0.01'V  for  .  The  plot  of  true- 

posilivis  for  j.NK,«8  remains  unchanged,  the  plot 
for  S.NR-I  now  starts  out  at  78°c  instead  of  S80c, 
S.'s’  R.**3  at  "> S‘c  instead  of  76'V.  S.N.R  —  2  at  3.',fc 
instead  of  •').!'(  end  SNR.— 1  at  llcc  instead  of  22T. 
The  plots  of  and  remain  approximately 

(he  same. 

Comparisons  of  the  statistics  of  various  operators 
are  valid  only  if  the  th“  siz-  of  the  support  used  to 
make  decisions  is  th»  same.  As  indicated  in  tne  intro¬ 
duction.  this  need  not  necessarily  be  th"  same  as  the 
window  sire.  Increasing  the  support  size,  which  in  our 
case  is  (.7  x  .">),  would  increase  the  fraction  of  true 
positives  and  decrease  th--  fraction  of  false  positives, 
for  any  given  S.N.R..  Also,  and  9,,^,  would 

decrease.  This,  however,  is  ni  the  expense  of  resolu¬ 
tion  between  adjacent  edges  and  the  detection  and 
localization  of  Ingh-curvature  edges 

We  end  this  section  with  a  word  of  caution.  The 
analysis  in  the  Appendix  and  the  statistical  data  of 
this  section  are  for  ideal  step-edges.  They  can  at 
best,  only  be  indi  ’ive  of  the  performance  -  n  real 
images  owing  to  the  numerous  simplilirationr.  and 
assumptions  invoked.  For  example,  non-constant 
intemity  surfaces  have  a  higher  likelihood  of  false 
positives  than  constant  surfaces  hke  those  used  foi  the 
statistics.  The  results  are  of  no  value  if  our  inherent 
edge-model  is  seriously  flawed.  Hence,  although 
theoretical  and  statistical  support  is  desirable,  the 
proof  of  the  pudding  is  in  the  eating 


VII!.  Three  Case-Studies  and  a  Comparison 

It  is  issentn!  •  .  ant  . t  some  d»  tails  concern¬ 
ing  the  photograph-  i|  Only  'he  strp-cdge| 

(let  eel  or  omi  s  I  in  -  "•hiding  In)')  has 

been  impb  '■  '  '  lep  in)'  margi¬ 
nally  mi|  -  .  irast  edges.) 

h)  The  .  dgi  l-i.  ...  f  their  (signs 

proportion  I  <  t  v  t  '  :.,.rict eristics 

of  the  di-p  y  ■  '  •  •  ■  •••<  seem 

thick  r  tb  m  tio-y  :u  '  also 

occurs  the  high'i:!;’  '  ''is 

can  easily  he  con!.  \  um  super* 

imposed  images  (,  i  .;•*  •■■:■■•  have  I 

threshol  led  in  all  cases,  for  a  about 

2  e(Tn  ,,.  (!)  edges  di-pl.i; 1  ;  eoi.,| . ,  :  of  mu' 

edgel.s  wiih  no  post.procis.-ing.  iike  linking,  thinning, 
cleaning  etc.  e)  Degradation  resulting  from  the  vari¬ 
ous  reproduction  processes  would  make  it  difficult  to 
confirm  some  of  the  edges  present  in  the  original 


Fig  12- a.  Bin  of  Farts  :  Original  Image 
(128  x  128) 


Fig  12-b.  Bin  of  Farts  Kdge|  Image 


Fig  12-c,  . . .  Funs  :  .Superimposed  Image 

image  This  is  particularly  true  in  the  high  inten-ity 
regions  whuh  saturate  the  display  well  below  the 
highest  gray  level  f)  The  pic  tures  with  thp  edgel.s  and 
the  superimposed  edgels  are  displayed  "ii  a  grid  with 
twice  the  linear  resolution  of  '.he  original  linage 
because  of  our  sub-pixel  localization  Further,  pixels 
ill  the  vicinity  of  edgels  nave  been  reduced  to  the 
lowest  gray  level,  for  clarity,  g)  It  is  important  to 
hear  the  size  'if  the  original  image  in  inind  when  scru¬ 
tinizing  the  pic  *  tile. - 

(i)  Indu -trial  Setting  Bin  of  Farts 

I  S-.--C:  128  X  128;  :  P  it;  <7.,,.,  :  3| 

Refer  to  Fig-  l'.’-a  (the  original  image),  12-b  (the 
edg'd  image)  aR'l  12-c  ( i (.-■  superimposed  linage). 
This  picture  was  chosen  to  d-m  >n-trate  the  reso¬ 
lution  capability  of  the  detector  ,md  its  perfor- 


40b 


Fig  13- a.  San  Francisco  Day  :  Original  Linage 

(258  x  258) 


Fig  13-b.  San  Francisco  Bay  :  Edge!  Image 


Fig  13-e.  San  Francisco  Bay  :  Superimposed 

Image 


maace  on  high-curvature  edges.  The  pins  of  the 
various  parts  have  false  negatives.  This  is 
because  they  are  bounded  by  dark  lines,  and  our 


edge  detector  has  currently  been  implemented 
only  for  step-edgels.  The  outer  edges  of  the  lines 
have  been  detected  although  not  well-localized 
but  the  inner  edges  exceed  the  resolution  capa¬ 
bilities  of  our  detector.  Notice  that  some  of  the 
circular  regions  detected  have  a  diameter  of  just 
a  few  pixels. 


(ii'  Aerial  View  :  San  Francisco  Bay 
(Si*  :  258  *  r  o;  <rH„ :  0.8;  <r„,„  :  5) 

Refer  to  Figs.  13-a  (the  original  image),  13-b  (the 
edgel  image)  and  13-c  (the  superimposed  image). 
This  picture  was  chosen  because  of  its  complex¬ 
ity.  On  firs*  glance,  it  may  seem  that  there  are 
a  host  of  falx  positives.  However,  a  closer 
examination  of  the  superimposed  image  reveals 
this  to  be  untrue.  Tie  long  lines  in  tke  sea 
correspond  to  silt  lines.  It  mry  not  be  possible 
to  confirm  them  in  the  photographs  you  will  see. 
In  any  ease,  notice  the  continuity  in  most  edges. 
Long  continuous  false  positives  are  statistically 
unlikely.  Also  notice,  the  detection  of  the  small 
island  in  the  mid-right  of  the  image,  in  the 
superimposed  image,  the  edgeis  are  seen  to 
impose  a  structure  based  on  local  intensity 
changes. 


F!|  14- a.  Indoor  Scene  :  Original  Image 
(258  x  2S8) 


(iii)  Indoor  Scene  :  Telephone,  Cup  and  Pencil 
(Size  :  258  x  258;  oUmr  :  0.6;  4) 

Refer  to  Figs  14-a  (the  original  image),  14-b  (the 
edgel  image  using  the  tanh-fit),  14-c  (the  edgel 
image  usiog  the  tsnh/cubie  fit)  and  14-d  (the 
superimposed  image).  This  image  was  chosen  to 
illustrate  the  inadequacy  of  the  tanh  oasis  to 
deal  with  step-edges  having  a  large  non-zero 
slope  on  either  side.  Note  the  top  surface  of  the 
telephone.  It  does  not  correspond  to  an  ideal 
step-edge,  but  to  a  generalized  step  which  some 
detectors  might  find  by  using  a  larger  scale.  The 
same  is  true  of  the  top  edge  of  the  pencil.  As 
can  be  seen  from  Fig.  14-b,  these  edges  are 


460 


Fig  14*b.  Indoor  Scene  :  Edge)  Image  (tank) 


Fig  14-c.  Indoor  Scene  :  Edgel  Image 
(tanh/cubic| 


Fig  14>d.  Indoor  Scene  :  Superimposed  Image 
ftanh/cubic) 


missed  if  we  use  a  tanh-fit  at  a  single  scale.  This 
is  a  result  of  the  inadequacy  of  the  basis,  i.e.  the 
tanh  and  a  constant  cannot  closely  approximate 
step-edges  which  have  large  deviations  from  zero 
slope  on  either  side.  I'sing  a  Unh/cubic  fit,  aa 
explained  in  Section  IV,  rectifies  this  writhout 
recourse  to  a  different  scale,  as  evident  from  Fig. 
14-c.  The  only  prominent  edge  which  seems  to 


have  been  missed  is  that  of  the  table  behind  t'ae 
book.  This  was  due  to  its  larger  scale,  which 
was  confirmed  by  tne  examination  of  its  profile 
at  tbe  individual  pixel  detail.  The  inner  portion 
of  the  flower  on  the  cup  has  a  few  false  negatives 
due  to  lack  of  resolution.-  The  superimposed 
image  once  again  exhibits  localization. 


Fig  15- a.  Original  Image  {256  x  256) 


Fig  IS>b.  Edgel  Image  -  Our  Detector 
(tanh/cubic) 


Fig  15-e.  Edge  Image  -  Marr-Hildretb  Operator 


461 


rig  lS-d.  Superimposed  Image  -  Oar  Detector 

(tanh/eubic) 


Fig  lS-f.  Close-Up  of  Super  imposed  Image  - 
Our  Dete.  tor  (tanh/cubic) 


Fig  16-h.  Close-Up  of  Superimposed  Image  - 
Our  Detector  (tanh/cubic) 


(iv)  A  Comparison  :  Bin  of  Parti 

(Si.-e  :  25®  x  25®;  tr^  :  0.9;  .  2) 

Refer  to  Figs.  15-a  (the  original  image),  15-b  (the 
edge!  image  for  our  detector),  15-c  (the  edge 
image  for  a  version  of  the  Marr-Hildretb  Opera- 
tort  [11])  and  15-d,  -e  (the  corresponding  super¬ 
imposed  images).  Cur  detector  used  a 
tanb/cubie  fit,  v  explained  in  sect;on  IV.  In 
order  to  facilitate  a  comparison  between  the  two 


fig  15-e.  Superimposed  Image  •  Marr- 
Hilareth  Operator 


Fig  15-g.  Close-Up  cf  Superimposed  Image  • 
Marr-Hildreth  Operator 


Fig  lg-L  Close-Up  of  Superimposed  Image  • 
Marr-Hildreth  Operator 


fTbe  choice  of  tbe  Marr-Hildretk  Operator  waa  baaed  solety  os 
coaceaiesce.  It  waa  uej  by  S.R.I.  lateraatioaal  for  tbc  I.T.A. 
Project  ia  wbicb  Staaford  waa  aiao  a  participant.  Aa  the 
displayed  image  waa  amoaf  (boar  aaed  id  tbc  Project,  we  report 
that  .be  operator  baa  bees  taaed  for  opturm  performance  os  it. 
Tbe  imptemeatdtios  sard  tbc  DJrrence  of  Gaaaataaa  |D.OG.) 
with  r,  —  |  A.  with  a  (11  a  It)  lapport.  aad  <r,  —  1,  with  a 
(7  a  7)  aapport  Tbe  choice  of  fjo,  “  I  S  retails  ia  a  close 
approximation  to  tbe  laplaciaa  of  a  faaaaiaa  ( t  i| .  Tbe  w» 
craaitap  were  tbreiboided  os  tbetr  alope.  It  ia  cosceiraMe  that  a 
difereat  mptemeatatua  of  lie  operator  will  prodacr  better 
rraalta,  bat  it  ia  aalikrly  that  tbr  iS|«oreimt  will  br  fnnvlie. 


462 


superimposed  images,  we  zoom- in  on  (128  x  128) 
subsections  in  Figs.  15-f,  -g.  -h  and  -i.  For  rea¬ 
sons  mentioned  in  the  beginning  of  this  section, 
it  might  no*  be  possible  to  confirm  all  the 
detected  edges.  In  any  case,  a  careful  examina¬ 
tion  is  instructive  to  discover  the  differences  in 
performance  between  the  two  operators  with 
respect  to  detection,  resolution  and  localization 
(especially  of  high  curvatuve  edges). 

LX.  Conclusion 

This  paper  deals  with  ti.  problem  of  edge- 
detection  using  directional  cr  ■ intentional  surfaces. 
Edges  were  defined  in  terms  o'  uort,  linear  segments 
called  edgels.  Detection  of  edgels  was  claimed  to  be 
more  appropriate  than  that  of  edge-pixels.  Some 
shortcomings  of  derivative  operators  were  then 
presented.  An  adequate  basis  fo*  most  step- edgels 
was  shown  to  be  the  taah.  It  is  likely  that  other  ade¬ 
quate  bases  exist  and,  .n  fact,  if  one  were  going  to  use 
a  table  look-up  to  perform  surface  fitting,  the  exact 
profile  of  the  ideal  step-edge  car.  be  stored.  This  is 
the  integral  of  a  gaussian  and  ha*  no  closed-form  solu¬ 
tion.  A  detailed  discussion  on  the  design  of  an  opera¬ 
tor  was  followed  by  an  outline  of  the  algorithm  and 
an  example.  Robustness  to  n<  ise,  sub-pixel  position 
localization  <  1/3)  ar  1  better  than  10*  angu¬ 

lar  localization  were  statistically  established  for 
S.N.R>  2.S.  This  was  accompanied  by  some  simple 
analysis  and  a  variety  of  imager  demonstrating  the 
performance  of  our  operator.  In  the  course  of  the 
paper,  it  w  as  indie  aied  that  our  handling  of  non-ideal 
step-edges  by  using  a  cubic  basis  is  not  completely 
sat  isfactory. 

An  attempt  was  made  to  highlight  some  of  the 
issues  and  concerns  in  edge-detection,  as  we  see  them 
Analytical,  statistical  and  empirical  tools  were 
employ-u  to  demonstrate  the  performance  of  the  pro 
posed  sigrrithir..  N'n  attempt  has  been  made  yet  at 
computational  efficiency.  We  have  concerned  our¬ 
selves  solely  with  af  quacy.  The  current  implementa¬ 
tion  Is  in  Pascal  on  a  VAX:  1/780.  The  processing 
time  is  typically  20  C.P.U.  minutes  for  a  (128  x  128) 
image.  We  expect  to  reduce  that  by  a  factor  greater 
than  2.  The  algorithm  is  implcmrntable  as  a  strictly 
parallel  process  and  has  natural  extensions  for  roof 
and  line  edges. 

Appendix  I  :  Zero-Crossing  Bias 

Let  E|  1 1  be  a  generalized  step  of  height  S  at  the  ori¬ 
gin.  a  re  i  C  ( z  )  be  a  normalized  gaussian  with  “stan¬ 
dard  deviation"  <Tu»r. 

|*i  *  it  i  <0 
“  i  k-.j+S  if  *>0 


Then  (£(/ )<(7(,r ))  is  the  corresponding  step-edge 
(where  •  denotes  convolution)  and  it  can  be  shown, 
that  (E(z  )*(7(z  ))'  =£■(/)  •G'\j  ). 


£•(!)»<;"(*)  —  Jt  *,.(*-«).<;"(«)  a* 

+  4.  l*t-(*  --  «)  +  S]  G  (■)<«« 

"  4»  *«•**  ~  « )'<7"(« )  <•• 

+  4J(‘t -*.)•(*  -«l  +  s|.c"(«)<u 

-  1 5  -(*,-*,).*].<;'(*) 

+  (*j  - *i)  |*C(* )  -  c  (*  II 

-  S.G  (x )  -  (i.-t.LCIx) 

Equating  this  to  zero  we  get,  z  =  aL./S 

where  A =(ti-kj)  and  5  is  the  step-size. 
This  is  the  biased  zero-crossing  of  the  second  deriva¬ 
tive. 

Appendix  11  :  Least- Squares  Criteria  &  Statistics 

Least-Squares  Criterion  for  a  Planar-fit  : 

tr  =  E  (/ms#«  [*, f)-(s,+  e.x  +  a,.*))1 

*  .»  -• 

(minimize  w.r.t.  a*,  tj  and  ay  ) 

Initial  Estimate  of  9  :  <b  =  tair'(«,  /a,) 

Least-Squares  Criterion  for  a  Quadratic-fit  : 

(<}  =  E  (/ma#e  [*,f]-(a«+  «|.t  +  aj.z*))* 

(Minimize  w.r.t.  ag,  *i  sad  aj) 
z  =  z.eot  (0)  +  g.tin(9 | 

9  is  determined  from  the  L.S.E.  cubic-fit  and  is 
the  angle  by  which  the  axes  have  to  be  rotated  to 
aiign  the  x-axis  with  the  edgel  cross- section 

Least-Squares  Criterion  for  a  Cubic-fit  : 

(c  =  E  l/mt/e  [*,f )  -  (ao+aiz  +  aji^azz1))1 

•  I-* 

(minimize  w.r.t.  *«,  ‘t,  *},  «z  and  9) 
i  =  z.eot(i)  +  | i.$in{9) 

The  initial  estimate  of  9,  from  the  L.S.E.  planar- 
fit,  is  refined.  The  equations  to  be  sol*  cd  are 
non-linear  in  9. 

Least- Squares  Criterion  for  a  Tanb-fit  : 

£r  =  E  [j,jt|  -  (».Unk{f  (z+pj)  +  k  ))* 

(minimize  *v.r  t.  a  ,  p  and  k ) 
i  ==  z.eot  (9)  +  ft  in  (91 

f  is  determined  from  the  rule  of  thumb  mentioned 
in  section  IV  i.e  (0.88  /  ).  The  edge  contrast 

is  2s  and  p  is  the  position. 

© 

Statistics  :  ® 

(f>  follows  the  XJ-Stat.  with  22  DOF. 

to/Omm*  upprox.  follows  the  \J-Stat.  with  21  DOF. 
*'  Jt. r  approx,  follows  the  X2-Stat.  with  20  DOF. 

tr /<*••"•  approx,  follows  the  X2-Stat.  with  21  DOF. 
{ IO.|(/> -(c)/ (c  )  approx,  follows  the  2,  20  F-StaL 
(20  ((q -(c)/(c  }  approx,  follows  the  1,  20F-Stat. 

The  above  formulations  are  inexact  because  of  the 
non-linearity  of  the  cubic  and  tsnh  bases  (in  9  and 
p  respectively)  and  the  fact  that  the  value  of  9 
used  in  the  tanh  and  quadratic  fits  >s  predeter¬ 
mined. 


463 


I 


1 


£ 


i 


Appendix  IH  :  Localization  of  Tanh-F;t 

Let  £(i )  be  an  ideal  step  of  height  S  at  the  origin, 
G(j)  be  a  normalized  gaussian  with  “standard  devia¬ 
tion”  and  Ei(z  -  k  )  represent  a  discrete  sam¬ 
pling  function. 


£(*) 


|  o  if  z  <  o 
Is  if  i  >  o 


Notice  *hat  fi(it)  are  sample?  of  th“  error  profile 
shown  in  Fig.  5.  Hence,  the  magnitude  of  flit)  is 
bounded  by  0.01.  Invoking  the  high  S.N.R.  assump¬ 
tion  once  again,  for  typical  values  of  (  ~~  O.o), 
•ve  can  drop  the  last  term  by  comparison  with  the 
other  (  term.  Then, 

“  t  "M..  I  »  S  ’ 


G[t\  *  «r 

VI  0.425  4I  0.55  ,  .) 

>  , - »cc*  - z-g 

I 

V  a+,  I'm,,'  r,| 

1 

w 

l 

o 

k 

s 

°  ,T'  “  >0  if  J  <">  0 

t vkere  fl(t )  it  •*»  it  J  me  4  tkstrt 

Further,  let  rjik)  represent  additive  white  gaussian 
noise  with  standard  deviation  <t,„„  .  Then,  we  can 
model  a  one-di.nensional  step-edge  at  position  p,  as 
below  (where  •  denotes  convolution). 

/<*)-  [£(»-,)-G(i)]<<z-*)  +  .i<*)  *  -  -tO+1.. 

Let  the  function  fitted  to  the  data,  /  (Jt ),  be 


510.5  +  0.5  tanh( 


0.85 

&Umr 


where  (  is  the 


error  in  position-localization.  The  factor  0.85  was 
chosen  to  minimize  the  lotai-square-error,  in  the 
absence  of  noise.  Then,  the  totaJ-square-crror  £  is 
given  by 

Minimizing  £  w.r.t.  <  by  equating  to  zero,  we 

ukrre  ^  •  •  j  it  Ik'  error  >trm  from  tkc  freetiinf  tqn 

Now,  let's  assume  the  signal  to  have  a  high  S.N.R.. 
Then.  <  is  small  and  we  can  substitute  the  acrh2  and 
tanh  terms  by  the  first  two  terms  of  their  Taylor 
series  expansion  w.r.t.  <.  Dropping  the  <J  term  and 
simplifying,  we  get 


L  ;  ’lilt  I 


_LL  ,„*2 

a  0  85  r  .  \ 

fink  —  It  —  b  1  1 

<r» »» 

11  f]\ 

1*  r  |  1 

fu»  1 

u ‘here  fl(  £  )  - 


-gE{z-p)*G  \x 


-  J  0.5  +  0.5  Unh 


d*-t)  (fi(*)-t-  -2^} 

l£Ml] 


Hi-t) 


Taking  the  expected  value  w.r.t.  ij,  th-  noise,  it  fol¬ 
lows  that  c,  the  error,  is  biased.  5  be  bias  is  a  func¬ 
tion  of  the  position,  p,  of  the  original  step  and  for  a 
typical  an, ,  (  =  0  6),  it  closely  resembles  a  sinusoid 
with  a  period  of  1  pixel-width  and  amplitude  1.06£-2 
pixel.  In  practice,  we  would  be  required  to  quantize 
the  position  we  determine  from  the  tanh-fit  and  in  ail 
likelihood  the  quantization  error  will  be  an  order  of 
magnitude  more  than  the  bias.  Taking  the  expecta¬ 
tion  of  t2  w.r.t.  i>,  we  get 

i  2 


E[<2\ P]  - 


0.425 

T. 

4\ht 


»«A4|-— da-*) 

(  <r**r  I 


Now,  let’s  obtain  an  expression  for  the  root-mean- 
square  of  the  total-error,  taking  position-quantization 
into  account.  Let  the  quantization  interval,  Af  ,  be 
0.1  pixel  and  the  quantization  levels  be  centered 
around  the  origin.  Further,  consider  p,  the  actual 
location  of  the  edge,  to  be  a  uniformly  distributed 
random  variable  in  the  interval  (-0.5  1-0.5)  and  let 

an,,  —  0.6.  Then,  it  can  be  shown  that  the  quanti¬ 
zation  erro*  is  approximately  uncorrelated  to  the  bias. 
Hero  o,  the  mean-square  of  the  total-errr.-  is  the  sum 

A 1 

2  i  **  1  -  -  *  *  and  4 


of  tne  expectation  of  £|<*|p  j  w.r.t.  p  auu  jy 

The  latter  is  the  variance  of  the  quantization  error 
For  the  above  choices  of  <rji„  and  Af,  the 


[151 


to  be 


_ Perhaps,  it  should  be 


nxit-inean-sqtiare  error  can  be  numerically  evaluated 
\/8  u£-4  +  - 

V  s  v  ij 2 

oointed  out  that  the  exact  choice  of  the  range  of  sum- 
stion  in  the  above  expressions  does  not  matter,  as 
' first  two  terms  on  either  sHe  of  the  origin  dom¬ 
inate  the  calculation.  Under  simulation  with  a ,55- 
pixel-vidth  window  centered  about  the  origin,  this 
expression  w*  found  to  be  in  error  by  less  than  55b 
for  S.N.R.  >  8  and  less  than  ItUTi  for  S.N.R.  >  4. 
Note,  that  intensity  quantization  effects  were  neither 
accounted  for  in  the  analysis,  nor  present  in  the  simu¬ 
lations. 


464 


t-> 

& 


I 


Acknowledgement 

Jim  Herson  and  Cregg  Cowan,  both  at  S.R.I. 
international,  made  it  possible  for  us  to  offer  a  com* 
parison  of  our  results  with  those  of  an  implementation 
of  the  Mcrr-Hildrelb  Operator.  V.S.N.  is  indebted  to 
Ron  Fearing,  who  wcs  a  constant  source  of  encourage¬ 
ment.  and  to  Brian  Wandcll  for  his  excellent  editorial 
commments. 


References 

[1]  I.E.Abdou,  W.K. Pratt:  “Quantitative  Design  and 
Evaluation  of  Enhancement/Tbresholding  Edge 
Detectors,”  Proc.  IEEE,  Vol.67,  No.5,  May  1078, 
753-763. 

[2j  H.C  .Andrews,  B.R.Hunt:  “Digital  Image  Restora¬ 
tion,”  Prentice-Hall  Inc.,  Englewood  Cliffs,  1877. 

[3]  T.O.BinTord:  “Inferring  Surfaces  from  Images," 
Art i3ci.il  Intelligence,  17,  August  1881,  205-244. 

[I]  P.RIi'her:  “Edge  Detection  and  Geometric 
Methods  in  Comput-r  Vision.”  Pb.D.  Thesis 
Math.  Dept..  1C.  Berkeley,  October  J984. 

[•'>]  M. Brady:  "Computational  Approaches  to  Image 
Cnderstanding,"  Computing  Surveys,  Vol.ll, 
No.  1,  March  1082.  3-71. 

[6)  F.J.Cannv:  "Finding  Edges  and  Lines  in  Images" 
AI-TIl  720.  M  ET.  A.l.  Lab..  June  1083. 

[7|  L.S. Davis:  “A  Survey  of  Edge  Detection  Tech¬ 
niques."  Computer  Graphics  and  Image  Process¬ 
ing,  Vol.4  No. 3.  Sep  1073,  218-270. 

[8[  R.M.IIaralirk:  “Digital  "top  Edges  from  Zero 
Crossing  of  Second  Directional  Derivatives," 
IEEE  Trans.  PAMI-6,  No  I,  Jan.  1881.  38-68. 

[9|  M  II  llueckel:  "An  Opeiator  which  Locates  Edges 
in  Digitized  Pictures,"  Journal  of  the  /K’.M, 
Yol.18.  No  1,  Jan  1971.  113-123 

[10|  V.l.cclerc.  S  W  Zucker:  "The  l/<>c*l  Structure  of 
linage  Discontinuities  in  One  Dimension."  TR- 
83-I9R.  Computer  Vision  and  Robotics  Lab. 

McGill  I  niv..  Mav  1981. 

*  0 

[11|  DC.Marr.  E  Hildreth  "Theory  of  Edge  Detcc- 
lion."  Proc  R.  Soc.  I.ond  .  If  207.  1980.  187-217. 

|I2|  V.S.Nalwa:  "On  Detecting  Edges."  presented  at 
l he  Image  1  ndeist.uiding  Workshop,  New  Orle¬ 
ans.  Louisiana.  Oct  HIS  I. 

[13]  R  Nevatia.  K  R  Habit:  "Linear  Feature  Extrac¬ 
tion  and  Description,"  Compuler  Graphics  and 
Image  Processing,  Vnl.  13  1080.  237-269 

[I I]  F  O'Gorman:  "Edge  Detection  using  Walsh  Func¬ 
tions,"  Ar'ificia!  Intelligence,  10.  1978,  213-233. 


[15]  A.V.Oppenheim,  R.W.  Schafer:  “Digital  Signal 
Ptccessing,"  Prentice-Hall  Inc.,  Englewood  Cliffs, 
1975. 

[16]  J.M.S.Prewitt:  “Object  Enhancement  and 
Extraction,"  in  Picture  Processing  and  Psycbo- 
pictorics,  B.S.Lipkin  and  A.Rosenfeld,  Eds., 
Academic  Press,  N.Y.,  1970,  75-1 19. 

[17]  K.S.Shanmugam,  F.M. Dickey,  J.A.Green:  "An 
Optimal  Frequency  Doraair  Filter  for  Edge 
Detection  in  Digital  Images,"  IFEE  Trans. 
P AMI-1,  No  1,  Jan.  1979,  07-19 

[18]  K. Turner:  "Computer  Perception  of  Curved 
Objects  using  a  Television  Camera,”  Ph.D. 
Thesis,  A.J.  Lab.,  Univ.  of  Edinburgh,  N^v.  1974. 


I 


<V 


r 

r 

s.* 

%• 


« 


2 


•s 


a 


465 


■r  4l 


(nCqjOm  ~^c1 


Visual  Surface  Interpolation: 
A  Comparison  of  Two  Methods . 


by 


Terrance  E.  Boultt 

Columbia  University  Computer  Science  Department 
NYC,  NY  10027.  tboult@cs.columbia.edu 


A 


§0  Abstract 

We  critically  compare  2  different  methods  for  visual  surface 
interpolation.  One  methui  uses  the  reproducing  kernels  of 
Hilbert  spaces  to  construct  a  spline  interpolating  the  data,  such 
that  this  spli  e  is  of  minimal  norm.  The  other  method,  presented 
in  Grirnsr  (1981),  recovers  the  surface  of  minimal  norm  by 
direct  minimization  of  the  norm  with  a  gradient  projection 
algorithm.  We  present  the  problem  that  each  algorithm  is 
attempting  to  solve,  then  briefly  introduce  both  methods.  The 
main  contribution  is  an  anal;  sis  of  each  algorithm  in  terms  of  the 
worst  case  running  time  (serial  processor),  space  complexity, 
and  rough  estimates  cf  the  running  time  and  spice  costs  for 
massively  parallel  implementations.  We  the  conclude  with  a 
discussion  of  the  differences  in  the  internal  representation  of  the 
surface  in  both  algorithms. 


§1  Introduction. 

It  has  been  shown  that  when  presented  with  sparse  depth  data 

(say  frorr  random  dot  stereograms)  the  human  visual  system 

infers  a  smooth  surface  passing  through  these  data  points.  The 

'This  work  supported  in  pan  by  Daxpa  grant  N00039-84-C-0165 
and  in  pan  by  NSF  grant  MCS-782-3676 


problem  of  computer  visual  surface  interpolation  is  to  take  a 
sparse  set  of  depth  values  and  calculate  the  surface  passing 
through  those  points  that  seems  to  model  the  surface  that  humans 
infer  from  those  same  data  points.  Grimson  (1981)  presented  a 
computational  model  of  this  process  in  the  human  visual  system, 
and  suggested  an  algorithm  that  may  be  used  to  recover  the 
pcrci-  '.cd  surface  from  the  depth  data. 

Although  it  may  be  fruitful  from  a  psychological  point  cf 
view,  to  develop  algorithms  that  tnay  be  biologically  realizable, 
this  restriction  may  increase  th;  rr.raputr  Jonal  cost  of  the 
algorithms.  Therefore  we  compare,  without  regard  to  biological 
feasibility,  two  methods  for  the  solution  of  this  visual  surface 
interpolation  problem  with  the  intent  to  determine  which  is  a  more 
eft' "tent  algorithm  for  use  in  computer  vision. 

The  first  of  these  methods  is  that  presented  in  Grimson 
(1981).  Grimson's  approach  was  to  represent  the  surface  by  a 
grid  of  depth  values,  and  to  use  nonlineal  programing  techniques 
and  directly  nunimize  ihe  “quadratic  variation”  or  bending  energy 
of  the  surface.  Because  the  problem  was  to  interpolate  the  given 
data,  Grimson  employed  a  constrained  optimization  algorithm 
called  the  gradient  projeefon  algorithm.  Because  cf  this  we  shall 
refer  to  Grimson's  approach  as  the  gradient  projection  basec 
algorithm. 


(•, 

c. 


r 

mi 


466 


»’*  »*;  .V  /• 


The  second  method  we  shall  examine  is  the  method  of 
reproducing  kernels.  Inis  method  uses  the  reproducing  kernels 
of  HUbert  or  semi -Hilbert  spaces  to  calculate  splines  of  minimal 
norm.  The  use  of  surfaces  of  minimal  norm  as  the  visual  surface 
interpolating  the  depth  data  is  done  in  spirit  of  the  minimization 
approach  used  in  Grimson  (1981).  The  use  of  reproducing 
keme's  to  recover  splines  of  minimal  norm  is  not  a  new  idea.  It 
has  been  studied  by  Duchon  (1976a,  1976b),  Meinguet  (1979a, 
1979b)  and  more  recen '1,'  by  Franie  (1982,  1983),  Franks  and 
Neilson  (1980)  (all  but  Meinguet  called  them  thin  plate  splines). 
However,  the  method  has  not  previously  been  given  serious 
consideration  for  visual  surface  interpolation  -  probably  because 
it  seems  unlikely  that  t'ie  human  visual  system  uses  such  an 
approach. 

In  section  2  we  derive  an  precise  formulation  of  the  visual 
surface  interpolation  problem.  Section  3  presents  details  of  both 
of  the  above  algorithms.  In  see  den  4  we  present  and  compare 
algorithmic  properties  (time,  space  and  parallel  time  complexity, 
optimality  and  accuracy  of  the  solution)  of  loth  method:. 
Section  3  is  a  discussion  of  the  representational  advantages  of 
splines  in  functional  form  over  simply  having  a  grid  of  depth 
values.  In  secdon  6  we  discuss  the  extensibility  of  both  methods 
to  other  spaces  of  functions,  and  o.'-rt  non  as.  Section  7  presents 
our  conclusions. 

§2  The  Problem. 

A  naive  formulation  of  the  visual  surface  interpolation 
problem  might  be  : 

to  find  “the  best  approximation”  to  a  surface  using  only 
the  knowledge  of  a  number  of  given  point?  thereon, 
where  we  require  the  surface  to  be  interpol  story ,  i.e.  to 
pass  through  all  the  given  data. 


A  major  difficulty  with  this  formulation  is  that  it  is  not  we)’ 
posed,  inasmuch  as  the  information  does  not  uniquely  determine 
a  solution.  In  fact,  given  any  set  (of  zero  measure)  of  points  on  v 
surface  there  are  infinitely  many  surfaces  interpolating  those 
points.  To  alleviate  this  problem,  we  must  somehow  restrict  the 
class  of  allowed  surfaces  and/or  give  some  method  of  ranking  the 
“plausibility”  of  a  surface. 

One  of  the  classical  ways  of  insuring  that  a  problem  has  a 
unique  solution  (applied  to  visual  surface  interpolation  in 
Crinson  (1981)  and  Kender,  Lee,  and  Boult  (1983))  is  to  use  a 
functional  on  the  surface  as  a  measure  of  the  “unreasonableness” 
of  the  surface,  and  to  restrict  the  allowed  class  of  surfaces  to 
make  it  a  Hilbert  or  semi-Hilbert  spa».i  and  make  the  functional-  a 
norm  or  semi-norm  on  this  space.  This  formulation  insures  that 
there  exists  a  unique  solution  to  the  problem  of  finding  a  surface 
from  the  allowed  class  which  minimizes  the  functional  (  tnd  hence 
is  the  most  reasonable).  Throughout  this  paper  we  shall  asswte 
that  this  ty.re  of  formulation  is  appropriate  for  the  problem  of 
visual  surfi.ee  interpolation.  We  shall  no.  investigate  which 
classes  of  surfaces  are  most  appropriate,  nor  which  functionals 
may  be  good  measures  of  '.be  unreasonableness  of  a  surface. 
Reeders  in  this  aspect  of  the  problem  may  consult  Boult  (1986). 

In  what  follows  we  choose  to  define  “best  approximation”  in 
terms  on  minimal  error.  We  assume  that  error  c*n  measured  by 
a  norm  with  respect  to  the  given  class  of  func  i-'  ns.  The  norm 
might  be  the  sup  norm  (i.e.  the  maximal  difference  between  the 
actual  surface  and  the  approximation),  or  the  norm  (integral  of 
the  square  of  the  difference  at  each  point*  The  error  may  be 
measured  in  either  a  relative  (e.g.  error  of  5%)  or  an  absolute 
sense  (e.g.  the  surfaces  never  differ  by  more  th'..  .1  mm) 
depending  on  the  go*!:  of  the  user.  Fina/ly  there  error  .  v»y  b» 
measured  in  the  worst  case,  >  ’  on  th-  average  (with  respect  to 
some  measure). 


r 


« 


.V 

£ 


467 


Combining  these  assumptions  a  piecise  formulation  of  the 
problem  of  visual  surface  interpolation  from  sparse  depth  data 
becomes : 

Let  Fj,  the  space  cf  allowed  surfaces,  be  a  Hilbert  or 
semi-Hilbert  space.  Let  F2  be  the  elements  of  Fj 
restricted  to  a  finite  domain  D  (since  we  are  oniy  inter  - 
ested  in  recovering  a  finite  portion  of  a  possibly  infinite 
surface).  Let  0(f):  Fi~»  be  a  functional  measuring 
the  “unreasonab’eness”  of  a  surface  (i.e.  the  mote  reason  - 
able  a  surf  ace  f.  the  smaller  6(f) ),  where  6  is  a  norm  on 
Fj  (a  semi-norm  if  Fj  is  semi-Hilbert).  Let  NO  w 

{zi,...,zk)  w  {  f(*i,yi) . f(*k,Vk) }  be  the  allt  wed 

information  (i.e.  the  allowed  input  to  solve  the  problem  is 
k  depth  values.)  Then  the  visual  surface  interpolation 
problem  is  to  find  (using  only  N(0)  f*  €  Fj,  such  that 
©<f*)  =  0(g). 

Ft 


number  of  different  classes,  sec  Boult  (19S5a).  It  is  known  that 
different  functionals  (and  the  associated  classes  for  which  they 
are  norms)  give  rise  to  different  interpolation  problems,  and 
hence  to  different  interpolating  surfaces.  The  reader  interested  in 
other  norms  and  their  associated  classes  should  consult  Grimson 
(1981),  Boult  (1985a)  or  Boult  (1986). 

§2.1  Allowed  Information. 

We  now  consider  the  allowed  form  of  the  information  N(f)  ■ 
(21 . ^k}  *  ....  f(*k-yk)}-  Each  piece  of  informa¬ 

tion  consists  of  2  i-O'nponents,  a  function  value  and  the  location 
of  that  evaluation.  Throughout  the.  paper  we  shall  assume  we  are 
given  the  value  .ueig'.t  above  or  depth  below  a  reference  plane) 
of  the  surface  at  known  points  in  x-y  space.  Note  thatthis  pre  - 
eludes  the  use  of  surface  gradients,  normals,  curvature,  etc.. 
This  pt  re  depth  data  might  be  the  result  of  a  stereo  based 
process,  a  rangefinder  oc  be  synthetically  generated. 

The  other  component  of  information  is  the  location  of  the 
function  evaluations.  We  consider  two  separate  ways  of  deter  - 
mining  the  locations  for  the  information.  The  first  method  is  to 
o  Cain  the  information  from  tnangulation  between  matched 
points  in  the  zero  crossings  of  the  Laplacian  of  the  Gaussian  of  a 
stereo  pa-r  of  intensity  images  (here  after  WG  zero  crossing 
information).  This  type  of  information  was  proposed  by  Marr 
and  Poggio  (1979)  as  that  available  ir.  the  human  visual  system, 
and  was  uteri  by  Gnmson  in  the  development  of  his  computa  - 
tional  study  of  sunace  interpolation  in  the  human  visual  system. 
Another  method  of  choosing  the  location  of  ne  information  v  to 
use  some  fixed  and  regular  pattern,  e.g.  a  regular  square  grid  of  r 
points  per  side,  each  point  separated  by  a  distance  h,  (thus  the 
number  of  depth  samples  is  i  -  r^).  This  regular  grid  infor  - 
mation  would  be  very  difficult,  if  not  totally  impossible,  to  obtain 
in  a  passive  stereo  system  hut  is  easily  obtained  from  active 


Kender,  Lee  and  Boult  ( 1985)  show  (as  a  special  case  of 
work  0.1  information  based  complexity  see  Trau„  and 
Woz'iQakowski  (1980),  or  Traub,  Wasilkowski  and  Woz'nia  - 
kowski  (1983))  that  given  the  above  formulation  the  surface 
minimizing  the  functional  0(0  will  also  be  the  minimal  ereor 
surface  with  respect  to  the  class  Fj  fo*  almost  o/.y  error  norm. 

One  functional  to  measure  unreasonableness  that  is  used  by 
both  Grimson  (1981)  (who  called  it  quadratic  variation)  and 
Kender,  Lee  and  Boult  (1985)  is  given  by  : 

6(f)  a 

We  note  that  this  is  just  one  paniru.'r.-  choice  for  the 
functional  ar.d  that  this  functional  is  the  norm  or  semi  norm  for  a 


32f 

a? 


2  (  *  ^ 
♦  2-  lH. 

^  dxdy 


(  -t  h2 

3'f 


4G3 


T  V'.  »*'.  '»■*  S'"  ST 


.  % 


ranging  system.  The  major  difference  then  between  the  two 
types  of  information  is  the  locadon  of  the  information  samples; 
which  may  be  effected  by  the  availability  of  an  active  ranging 
system  (an  option  not  open  to  the  human  visual  system). 

V>  e  note  that  information  derived  from  the  zero  crossings  of 
^20,  yields  locations  (both  the  number  of  and  position  of )  that 
depend  in  a  very  nonlinear  way  on  the  surface  viewed.  This 
ewa  information,  (i.e.  the  knowledge  that  information  is 
evaluated  at  the  location  of  the  zero  crossings  of  the  intensity 
bange)  is  not  used  by  any  algorithm  known  to  this  author.  After 
a  casual  reading,  it  night  seem  that  Grimson's  algorithm  should 
take  advantage  of  this  extra  information,  inasmuch  as  the  surface 
consistency  constraint  Grimson  (1981, ,  i30),  shows  a  relation  - 
ship  between  the  location  of  the  zero  crossings  and  the  variation 
of  a  surface.  However,  Grimson’s  algorithm  is  based  on  the 
choice  a  functional  (quadratic  variation;  that  does  not  truly 
embody  the  surface  consistency  constraint,  because  it  minimizes 
the  total  variation  of  the  surface  and  not  the  variation  between 
zero  crossings.  Note  that  it  is  not  necessarily  true  that  the 
interpolating  surface  with  minimal  total  variation  also  has  minimal 
variation  between  each  set  of  zero  crossings.  To  see  this 
consider  a  interpolating  surface  that  has  almost  zero  variation 
eetweer.  all  but  one  pair  of  zero  crossings  (and  hence  generally 
satisfies  the  surface  consistency  constraint),  but  whose  variation 
between  that  pa.r  is  arbitrary  large  (maybe  the  surface  is  not  even 
continuous  at  one  of  the  zero  crossings).  Such  a  surface  may 
have  arbitrarily  large  total  surface  variation  but  m  y  have  minimal 
variation  between  zero  crossings  (except  the  one  pair). 

Since  neither  the  reproducing  kernel  algorithm  nor  the 
gradient  projection  based  algorithm  make  special  use  of  V2g 
type  information,  we  shall  freely  compare  them  with  respect  to 
both  V2g  zero  crossings  and  regular  grid  information.  Here 
after,  we  shall  let  the  number  of  information  points  (regardless  of 


its  origin)  be  denoted  by  k,  and  the  set  of  information  by  N(f)  = 
{zi,...zk}  a  {f(x j ,y | )....,  f(xk,yk)}. 

§2.2  The  Desired  Output. 

The  final  component  in  the  formalization  of  the  problem  is  to 
specify  what  it  means  to  find  an  approximation,  i.e.  we  must 
consider  the  representation  of  the  desired  volution.  Though  there 
arc  many  representations  we  might  chose,  we  shall  examine  only 
those  two  used  by  the  methods  under  consideration. 

The  first  and  simplest  representation  of  the  surface  is  as 
function  values  at  some  predefined  points  (e.g.  on  a  2-d  mesh). 

This  is  the  representation  used  by  the  gradient  projection  based 

© 

algorithm.  In  this  algorithm,  the  grid  is  a  uniform  2d  mesh, 
large  enough  to  include  all  information  points.  We  shall  let  the 
total  number  of  points  in  this  grid  be  n. 

The  other  representation  of  the  solution  surface  we  shall 
consider  is  as  a  function  of  x  and  y,  which  can  be  evaluated  at 
any  point  Obviously  given  this  repp*'  vntation  the  first  reprcsen  - 
tation  can  be  recovered  but  not  visa  versa. 

Note  that  the  user  may  be  interested  in  recovering  the 
interpolated  surface  at  fewer  than  the  n  points  used  in  the  first 
representation.  Hereafter  let  p  be  the  number  points  at  which  the 
interpolatory  surface  is  to  recovered.  We  need  not  require  that 
the  p  solution  points  contain  or  be  contained  in  the  k  information 
points. 

Finally  we  note  that  it  would  be  improper  to  compare  two 
different  methods  if  they  were  calculating  the  surface  in  different 
representations.  Therefore,  throughout  sections  3-6  we  shall 
assume  the  reproducing  kernel  method  is  used  first  to  calculate  its 
spline  representation  then  the  spline  is  evaluated  at  the  p  points 


■16  9 


the  solution  is  desired  at,  which  is  a  subset  of  the  n  points  used  in 

(1981,  pi  80). 

It  is  also  assumed  that  the  inf  irmsticn  is  given  at 

j. 

the  gradient  projection  based  algorithm. 

por  ts  within 

this  grid,  and  for  simplicity  that  the  the  grid  is 

square  with  size  m  x  m  (where  m  =  'in).  In  the  following 

discussion  each  grid  point  is  represented  by  its  coordinate 

•; 

i 

location  (i  j),  1 

£  ij 

2  m,  and  the  solution  surface  is  represented 

§3  Description  of  the  Two  Methods  of  Solution. 

as  its  value  at  each  grid  point,  i.c.  ssj.  Grimson  begins  by 

deriving  a  discrete 

analogue  of  the  functional  9(0.  and  then 

.  ’/ 

* 

'  In  this  section  we  briefly  describe  the  two  methods  of 

solves  the  discrete 

minimization  problem  given  by:  (Equation 

solution  to  the  surface  interpolation  problem,  which  we  shall  be 

(3.1)) 

comparing  in  this  paper.  We  shall  refer  to  the  two  methods  as  the 

m-2 

m-1 

gradieht  projection  based  algorithm  and  reproducing  kernel 

minimize 

z 

Z  (si-l,j'2sij  +si+lj)^ 

;  ** 

algorithm.  We  start  by  discussing  the  i  heoretical  basis  that  they 

:=i 

j=0 

W  *  *  • 

have  in  common. 

m-1 

m-2 

V  /  '  /_ 

+ 

z 

i=0 

Z  (  si,i-l  '  2sij  +sij-,rl)^ 
j-» 

II 

Neither  method  actually  require?  that  9(f)  be  quadratic 

variation  as  in  (2.1),  only  that  it  be  a  norm  or  semi-norm  on  the 

m-2 

m-2 

T-V 

space  Both  methods  rely  on  the  theorem  from  functional 

+ 

z 

i=0 

Z  (  SiJ  *  *i  +  l  j  -  t>ij  +  l  +  Sj+1  j+1  )  “ 

J=0 

analysis  that  states  if  9(0  is  a  norm  over  the  Fj  and  Fj  is  a 

Hilbert  space  then  there  exists  a  uniqise  function  from  Fj  mini  - 

subject  to  s;  j  = 

■  f(ij)  V  f(ij)  .=  Ntf). 

B  ' 

mizing  3(0-  (If  ©(0  is  a  ..cmi-nor.n  and  Fj  is  only  a  setni- 
Hiloert  space  then  die  solution  exist  and  is  unique  up  to  a  member 
of  the  null  space  of  0(f).) 

The  two  methods  differ  in  how  they  minimize  the  functional 
©(f)  (with  respect  to  the  class  cf  functions  Fj)  and  in  their  repre  - 

sentation  of  the  solution. 


§3.1  The  Gradient  Projection  Based  Algorithm. 

We  now  examine  the  gradient  projection  based  algorithm  as 
discussed  in  Grimson  ( 1 98 1  >.  Inherent  in  the  development  of 
this  a'qorithm  is  the  representation  as  “explicit  depth  values  at  all 
locations  within  a  Cartesian  grid  of  uniform  spacing”  Grimson 


To  solve  L  is  problem  he  uses  a  nonlinear  programing  algor¬ 
ithm  called  the  gradient  projection  algorithm  (actually  he  seems  to 
use  a  modified  version  of  this  algorithm  usually  known  as 
Goldfarb's  algorithm  see  Avriei  (1976).  To  implement  this 
algorithm  he  develops  stencils  (sec  Grimson  (1981,  pl8J-184)) 
to  allow  the  calculation  of  the  gradient  of  the  objective  function. 
To  determine  the  amount  to  move  iu  this  direction,  one  mjst 
calculate  the  minimum  of  the  objective  function  in  that  direction. 
To  do  this  Grimson  calculates  the  value  a  that  minimized  the 

expression:  (Equation  (3.2)) 
m-2  m-1 

Z  Z  (si-l,j  - 2sjj  +  s;+i  j  +  erdj.]  j  - 2ctdjj  +  adj+ij)2 
i=I  j=0 

m-1  m-2 

+  Z  S  (sjj.i  -2sjj  +  sjj+1+  adij.j  -  2adj j  +  otdjj+i)2 
i-0  j=l 


* 


•; .  ©'■ 


L"  C"  -.' 

vY-V-.'s 


1  % 

.s'*-.*/. 


470 


m-2  m-2 

+  X  X  (  si,j  -  si+l,j  “  si,j*l  +  si+l,j+l 

i=0  j=0 

+  a  di,j '  a  di+lj  "  a  diJ+l  +  a  di+l  j+l)^ 

where  *■  jls  the  negative  of  the  value  obtained  from  the 

convolution  of  *ne  appropriate  stencil  (see  Grimson  (1981,  pl83- 
184))  with  sjj  (i.e.  the  negative  gradient  direction  or  direction  of 
steepest  decent  of  minimizing  the  surface  variation).  He  the 
concludes  that  a  =  aj  /  ct2  where  (Equation  (3.3)) 
m-2  m-1 

al=I  X  (si-i,j -2sij  +sj+ij)2  (dj.ij  -2d,j +d;+ij  )2 
i=l  j-0 

m-1  rn-2 

+  X  X  (tij-l  -  2s;  j  +  s;  j+i)2  (dij.i  -  2djj  +  dij+i)2 
i=0  j=l 

m-2  m-2 

+  X  X  (  (Sj,j  -  sj+i  j  -  sj j+i  +  sj+j  j+l)2  (dj j 
i=0  j=0 

-di+l,j-di,j+l  +  di+lj+l)^) 

and  (Equations  (3.4)) 
m-2  m-1 

a2  =  X  X  (dj.;j  -2djj  +  di+ij)2 

i=l  j=0 
m-1  m-2 

+  X  X  (  d;,j-l  -  2-dj j  +  djj+i  )2 

i=0  j=l 
m-2  m-2 

»-  X  X  (djj  -di+tj-d,j+i  +  tli+ij+i)2 

i=0  j=0 

Thus  the  complete  gradient  projection  based  algorithm 
employed  by  Grimson  consists  of  the  follcv/ing  5  steps: 

Step  1:  Determine  a  feasible  initial  surface  (any  surface 
interpolating  the  information  will  do). 


Step  2:  Compute  the  negative  of  the  gradient  direction  (the  dj  j 

above)  by  taking  the  the  convolution  of  the  current 
approximation  (the  sj  j's)  with  the  stencils  (setting  the 
dj  j  =  0  if  i  j  is  an  information  point). 

Step  3:  Compute  a  j  and  aj  (from  formulas  (3.3)  and  (3.4) 
auove)  and  then  set  a  =  cq  /  a2. 

Step  4:  Refute  surface  approximation  (i.e.  for  each  i  j  set  Sjj  := 
si,j  +  adi,j)- 

Step  5:  If  djj  £ 1*  V  ij  <  m  then  approximation  is  complete 
Else  goto  Step  2; 


$3.2  ’she  Method  of  Reproducing  Kernels. 

The  method  of  reproducing  kernels  calculates  a  spline 
function  that  exactly  solves  the  continuous  problem  of  finding 
the  function  from  Fj  minimizing  6(f).  There  are  at  least  two 

different  algorithms  based  on  the  use  of  reproducing  kernels,  we 
shail  present  only  one.  The  interested  reader  may  cornv’t  Boult 
(1985b)  or  Boult  (1986)  for  a  more  detailed  discussion  of  both 
algorithms  based  on  reproducing  kernels.  The  following  discus  - 
sion  of  reproducing  kernels  for  interpolation  is  based  on  the  theo  - 
retical  work  of  Meinguet(  1979a,  1979b). 

For  this  method  to  be  appropriate  it  is  sufficient  to  have  F ]  be 
a  semi-Hilbert  space  and  ©(0  the  associated  semi-norm  with  null 
space  n l -  (Throughout  this  paper  Ill  is  the  space  spanned  by 

{ l,x,y} ).  To  insure  uniqueness  of  the  solution  we  must  assume 
that  the  information  Nj^f)  contains  a  riiunisolvent  subset,  i.e. 

there  exists  a  set  J  of  indices  (a  subset  of  tne  index  set  I  =  1 . . .  k) 


and  associated  information  points  x;,yj  with  information  values  zj 

such  that  for  each  element  of  r  (there  are  3  in  the  present  case) 
there  exists  a  unique  pj(x,y)  <=  IT  1  s  tch  that  for  all  j,  j'  from  J, 
Pj(xj,yj)  =  i  and  pj(xj',y;0  =  0  if  j*j\  Note  that  if  Nk  contains 
evaluations  3  or  more  ncn-colinear  points,  then  Nk(f)  will 
contain  a  Hiunisolvent  subset  (This  restnction  on  the  infor¬ 
mation  having  at  least  3  non-colineai  points  also  applies  to  the 
gradient  prcjection  based  algorithm.) 

In  the  development  of  this  ncthod  we  use  that  fact  that  we 
can  separate  the  space  Fj  into  Xo  &  Ill,  Xq  *  tg  e  Fj:  g(xj,yj) 
=  0,  V  j  €  ]}  where  ©  is  a  (topological)  direct  sum  With  this 
decomposition  Xq  is  a  Hilbert  space  with  ©(•)  as  a  norm  (not  a 
•:emi-norm).  Given  the  reprodu.  ing  kernel  Ko((s.t);(x,y))  of  Xq 

,  which  can  be  expressed  in  ter.  is  of  the  reproducing  kernel  of 
F]  and  the  functions  pj(x,y)  see  Boult  (1985b)t  t!  e  spline 

surface,  of  minimal  8(0  norm,  which  interpolates  ti  e  informa  - 
lion  N(f;  .s  given  by  (Equation  (3 J) : 

tfvCfiy)--  £  Ti  K-o((xj.  yj);  (x, /))  '  £  zj  ■  Pj(xj.yj) 

i  €  I-J  j  e  J 

where  the  coefficients  ri  can  be  calculated  from  the  (k-3)  by  (k-3) 

dense  linear  ystem  (equation  3.6): 

£  X0(xk.y;-;  xj.y;)  •  n  =  ^  ~  I  Pj(*j.yj)  V  k  €  !. 

ir-  S-:  1  €  J 

.he  reproducing  ketrel  method  then  consists  of  the  three 
foMowing  steps: 

Step  1  •  Calculate  the  matrix  of  coefficients  for  the  left  hand  side 
of  e- -cation  (3.6). 

S:ep  2:  Compute  y„  i  ^1  ..k-3,  the  solution  to  ecuation  (3.6). 


Step  3:  Compute  the  value  of  interpolating  surface  at  all 
solution  points  using  equation  (3  J). 

Note  that  Step  1  and  2  are  necessary  parts  of  the  algorithm, 
whereas  step  3  is  simply  to  allow  comparison  of  this  method  wiJi 
the  gradient  projection  based  algorithm  Also  note  that  for  fixed 
regular  data  it  is  possible  to  precompute  the  Cholesky  Jecomp  - 
osition  of  the  coefficient  matrix  (which  is  determined  entirely  by 
the  lor  ation  of  the  information),  and  then  step  2  is  simply  the 
calculation  of  the  yjs  using  back  substitution. 

A  proof  that  the  above  spline  is  of  minimal  norm  can  be 
found  in  Meinguet  (1975a,  1979b). 


§4  Comparison  of  Computational  Issues  of 
The  Two  Methods. 

In  this  section  we  provide  an  analysis  and  comparison  the 
two  visual  surface  interpolation  methods  on  a  number  of 
computational  issues.  These  issues  and  the  subsection  in  which 
they  are  treated  are: 

*4. '  Time  complexity, 

§4.2  Space  complexity, 

§4.3  Inherent  Parallelism  and  Parallel  Time  Complexity, 

O 

§4.4  Optimality  and  Accuracy  of  Solution. 

For  all  of  the  comparisons  we  shall  assume  that  k,  p,  n  are 
defined  as  in  sections  2.  and  3.  When  we  refer  to  the  steps  of  the 
gradient  projection  based  algorithm  and  the  reproducing  kernel 
method,  we  are  referring  to  the  steps  as  defined  in  sections  3.1 
and  3.2  respectively.  A  synopsis  ot  ihe  results  can  be  found  in 
Table  4.1. 


172 


§4.1  Time  Complexity. 

hirst  leu  us  estimate  an  upper  bouna  on  the  worst  case 
running  time  (assuming  each  arithmetic  operation  costs  unity)  of 
the  reproducing  kernel  method  when  the  information  is  from 
V^G  zero  crossings.  Obviously  step  1  cosa  0(k2),  and  step  3 
costs  G(kp).  For  step  2,  using  Cholesky  decomposition  (the 
matrix  is  positive  definite),  the  cost  will  be  .5k^.  Therefore  the 
worst  case  cost  is  .5k^  ♦  0\2  +  kp).  (A  careful  analysis  of  the 
current  implementation  results  in  a  cost  of  s  _5k2  *  70!p  . 
0(k2),  but  this  depends  jn  the  choice  of  the  space  Fj,  norm  9(f) 

and  the  associated  reproducing  kernel.) 

Now  we  note  that  if  the  information  is  gathered  on  a  regular 
grid,  then  we  can  do  the  Cholesky  decomposition  once  (a 
precomputation),  and  store  the  results.  Given  this  decomposition 
we  can  reouce  the  cost  of  srep  2  to  0(k  »),  and  the  overall  cost  to 
0(k2  ♦  kp).  (In  fact,  given  the  decomposition  for  a  grid  of  sire  r 
x  r  we  also  nave  the  decomposition  for  all  smaller  grids.) 

Now  let  us  estimate  an  upper  bound  on  the  worst  case  ran  • 
ning  time  of  the  gradient  projection  based  algorithm.  Step  1  of 
that  algorithm  obviously  cost  0(n).  Examination  of  the  stencils 
given  in  Grimson  (1981,  p)  83)  yields  a  cost  per  iteration  for  step 
2  of  approximately  26n.  For  step  3,  equations  (3.3)  and  (3.4) 
yield  a  per  iteration  cost  of  approximately  24n.  Finally  for  step  4 
costs  2n  per  iteration.  Thus  the  total  per  iteration  cost  of  the 
algorithm  is  =  50n.  We  take  the  number  of  iterations  to  be  n, 
which  is  the  upper  bound  on  the  number  iteration  of  the  Gold  - 
farb's  algorithm  (the  nonlinear  prof  ranting  algorithm  on  which 
the  method  is  based)  for  problems  of  this  type  (*ec  Avriel  ( 1976, 
p436)).  (Note  that  this  is  fir  better  than  the  0(n2)  iterations 
which  Terzopoulo*  (1984,  plU3)  suggests  the  gradient  projection 


based  algorithm  takes.)  Combining  the  above  we  arrive  at  a  total 
estimated  cost  for  the  gradient  projection  based  algorithm  of 
approximately  50d2.  (We  nose  that  the  number  of  iterations  will 
actual  depend  on  die  number  of,  value  of,  and  location  of  the 
information.  Thus  one  may  be  able  to  get  a  better  estimate  for 
fixed  regular  data.  However  one  can  easily  show  tha,  no 
placement  of  data  can  result  in  less  dun  V(n/k)  iterations  and 
worst  case  placement  of  fixed  dau  can  easily  be  shown  to  result 
in  at  least  2Vn  iterations.  Beth  of  these  bounds  are  trivial,  and 
the  actual  lower  bound  is  probably  0(n).) 

Thus  for  V^G  zero  crossing  information  the  reproducing 
kernel  method  is  faster  whenever  _Sk2  *  70kp  <  50o^.  And  for 
grid  data,  the  reproducing  medsod  it  faster  when  Ok*)  +  70kp  < 
50n2. 

{4.2  Space  Complexity. 

The  space  required  for  the  reproducing  kernel  method  is  -5k  2 
*  Oflt)  for  steps  1  and  2  (independent  of  the  type  of  information), 
and  k*p  +  0(1)  for  seep  3.  (This  assumes  that  the  mer  needs  all 
p  values  at  the  same  time.  If  the  user  can  use  the  points 
sequentially,  then  the  space  for  step  3  is  simply  k  +  0(1).) 

The  space  complexity  of  the  gradient  projection  based 
algorithm  can  be  calculated  by  examining  equation  (3.2). 
Although  algorithms  trying  to  obtain  minimal  time  complexity 
mey  use  more  space,  each  iteration  of  the  algorithm  can  be 
programed  unng  only  2n  +  0(Vn)  space.  Note  that  no  savings  in 
space  is  obtained  ;f  the  user  only  requires  the  solution  points  oitt 
at  a  time. 

7  lierefore  the  reproducing  kernel  method  will  use  less  space 
whenever  min(k2jt*p)  <  2n,  (i.e.  whenever  k  <  J(2n)  because  p 
5n). 


473 


$4.3  Inherent  Parallelism  and  Parallel  Time 
Complexity. 

In  this  subsection  we  examine  the  sources  of  parallelism  in 
both  of  the  methods,  and  estimate  their  parallel  time  complexity. 
Tme  values  may  vary  depending  on  the  instruction  set  of  the 
parallel  machine  being  used,  its  topology,  its  memory  limitations, 
number  of  processors  and  its  mode  of  operation  (SIMD  or 
MIMD). 

There  are  four  different  sources  of  parallelism  in  the 
reproducing  kernel  method.  The  fust  is  the  evaluates  of  the 
spline  function  n  one  point,  which  involves  the  evaluation  and 
summation  a  weighted  kernel  function  at  each  of  the  k 
information  points.  This  can  be  parallelized  in  a  straight  forward 
SIMD  fashion  to  run  in  time  0(1  og  k). 

The  second  source  of  parallelism  in  the  reproducin’  kernel 
method  is  the  evaluate*  of  the  p  surface  solution  points.  These 
points  can  easily  be  evaluated  simultaneously,  again  in  a  SIMD 
fashion,  resulting  in  a  factor  of  p  speedup.  Combining  the  fust 
two  parallelizations  we  could  speed  i-p  the  surface  reconstruction 
(given  the  coefficients  of  the  spline)  rrom  70kp  to  0(1  og  k). 

The  third  form  of  parallelism  come  from  the  calculation  of  the 
coefficients  of  the  spline.  Given  the  decomposition  of  the 
coefficient  matrix,  we  ran  compete  die  coefficient!  in  parallel  in 
time  k. 

The  final  type  of  parallelism  is  that  inherent  in  the  solution  of 
2  kxk  linear  system.  This  has  been  studied  elsewhere  and  in 
general  one  can  gain  a  speed  up  factor  of  k. 

Thus  our  estimate  of  the  parallel  running  time  for  the 
reproducing  kernel  algorithm  is  0(k^)  if  we  must  decompose  the 


matrix  coefficients  (as  we  must  for  information),  and  Ofk) 
if  the  decomposition  is  precomputed  (as  it  is  may  be  for  regular 
gird  data). 

The  parallelisms  inherent  in  the  gradient  projection  based 
algorithm  include  the  calculation  of  die  g.».:'ient  direction,  local 
calculation  of  each  of  the  terms  needed  far  the  calculation  of  the 
parameter  a,  and  updating  the  surface.  These  will  reduce  the 
number  of  operations  per  itcraboa  by  tn  estimated  amount  of  26, 
20,  2  respectively.  Furthermore,  given  the  local  terms  for  the 
calculation  of  die  parameter  ot,  we  can  speed  up  that  calculation 
by  a  factor  of  (leg  n  )  /  a  by  using  log  redaction  for  the  Sum  • 
mabon.  Note  that  the  number  of  iterations  cannot  be  reduced  by 
parallel  implementation.  The  total  estimated  time  complexity  of 
the  parallel  gradient  prowejoe  based  is  0(a  ■  log  n\ 

Based  on  these  estimates,  a  parallel  imr/lementation  of  the 
reproducing  kernel  method  would  be  faster  than  a  parallel 
implementation  of  the  gradient  projection  hated  algorithm  when 
0(k?)  <  0(n-log  n)  and  the  location  of  the  data  is  allowed  to 
vary,  or  when  the  data  is  fixed  ami  0(k)  <  Otnloj  n). 

$4.4  Optimality  and  Accuracy  of  Solution. 

Both  methods  under  consideration  started  off  with  the  idea  of 
finding  the  surface  of  minimal  norm  (6(f))  over  Fj,  a  Hilbert 

space  (or  semi-Hilbert  space).  This  had  the  advantage  of 
resulting  in  a  unique  specification  of  the  surface  to  recover.  As 
mentioned  before  it  is  known  that  such  a  surface  is  also  a  minimal 
erre.  solution  among  all  surfaces  in  the  class  Fj  mat  interpolate 

the  data,  and  that  this  minimal  error  property  holds  for  almost  any 
reasonable  definition  of  error,  sec  K coder,  Lee,  and  Boult 
(1983).  Thus  theoretically  both  methods  are  attempting  to  find  an 
optimal  error  imemdant  from  Fj. 


The  reproducing  kernel  method  theoretically  does  calcul  .te 
this  optimal  error  surface.  The  errors  in  the  coefficients  of  the 
spline  surface,  introduced  by  the  approximate  solution  of  the 
linear  system,  however  result  in  the  algorithm  reconstructing  a 
different  surface.  The  magnitude  of  the  error  in  these  coeffic  - 
ients  depend  on  the  condition  number  of  the  linear  system,  which 
in  turn  depends  on  the  placement  of  the  information  points. 
Initial  experiment!  suggest  that  for  a  regular  grid  of  information 
the  condition  number  is  approximately  19.5  k2.  Because 
Cholesky  decomposition  and  back  substitution  are  numerically 
stable  (given  proper  implementation),  we  know  the  resulting 
coefficients  differ  from  the  true  spline  coefficients  by  at  most  e  i 
a  c  |  ■  2-t  •  19.5  k^,  where  t  is  the  number  of  bin  in  the  mantissa 
of  the  floating  point  representation  on  the  machine  and  Cj  is  a 
fixed  constant  depending  of  the  floating  point  implementation. 
Then  the  maximum  error  of  any  surface  reconstruction  (ft  ora  the 
optimal  surface,  not  from  the  surface  generating  he  ir/ormahon) 
is  <  k  e|  max(Ko).  Note  that  this  is  totally  independent  c«  ‘he 

number  of  reconstruction  points,  but  depends  on  the  dittance  of 
the  reconstructed  points  from  the  information  points.  Q 

The  gradient  projection  based  algorithm  however  leaves  its 
theory  behind,  The  first  step  in  the  method  is  the  diserttuaton  of 
the  functional  to  minimize.  This  disci etizaton  is  well  studied  ui 
mathematical  physics,  and  the  error  introduced  by  it  u  Ori  - ) 
when,  h  is  the  distance  between  grid  points.  The  method  then 
attempts  to  minimize  this  discrete  functional  without  regard  for 
the  space  F * .  thus  its  solution  may  not  even  be  a  "feasible  solu  - 

lion".  (Note  that  this  is  not  as  simple  a  problem  to  over  come  as 
it  m'g’.it  seem  because  the  discretized  version  of  0(0  i*  no  longer 
a  norm  or  semi-norm  on  Fj  so  there  is  not  even  an  assurance  of  a 

surface  minimizing  the  discretized  version  of  0(0  existing,  let 
alone  being  reachable  by  a  sequence  of  surfaces  from  Fj.) 
Finally  there  is  the  error  introduced  by  the  gradient  projection 


portion  of  the  algorithm,  and  by  terminating  the  algorithm  before 
it  has  computed  the  exact  solution  (to  the  perturbed  problem.) 
Currently  we  do  not  have  estimates  on  the  error  of  the  algorithm, 
but  the  work  of  Terzopoulo*  (1984)  suggests  that  the  error  does 
goes  to  zero,  albeit  very  slowly,  as  both  the  number  of  points  and 
number  of  iterations  grows. 


ReproducingCF) 

Reproducing(V) 

Gradient  Projection 

TC 

O0t2  +  kp) 

(k2/6+0(k2+  kp) 

50  «2 

SC 

(.5k2+OCp)) 

(Jk2+0(kp)) 

On+OfVn)) 

PT 

CXk2) 

ooo 

0(n  log  u) 

Table  4.1.  Comparison  cf  essential  properties  of  2  algorithms. 
TC  stands  for  time  complexity,  SC  foc  space  complexity,  and  FT 
for  parallel  tan:  complexity.  Here  k  is  the  number  of  information 
(depth)  samples;  p  is  the  number  of  points  in  the  desired  solution; 
n  is  the  number  of  points  in  the  grid  used  by  the  gradient 
projection  based  algorithm.  The  F  and  V  in  the  titles  refer  to 
fixed  aM  varying  data  respectively  where  fixed  atd  varying  data 
refer  only  to  the  location  of  the  information  points,  which  effect 
the  performance  of  the  reproducing  kernel  algorithm. 


§5  Advantages  of  Spline  Representation. 

In  this  section  we  discuss  the  advantages  of  the  spline 
.-•presentations  over  the  mesh  /  grid  representation  used  by  the 
gradient  projection  based  algorithm. 

The  fart  advantage  of  the  functional  spline  representation  (as 
a  weighted  sum  cf  kernel  functions)  is  that  we  can  easily  compute 
a  estimate  cf  any  functional  on  the  “trie  surface"  (e.g.  an  integral 
or  a  derivative  of  the  surface  that  generated  the  information 


points)  by  applying  said  functional  to  the  spline  u  a  function.  In 
fact,  provided  that  the  functional  is  linear  and  that  the  space  F  j  is 

sufficiently  smooth,  the  estimation  so  obtained  is  an  optimal  error 
estimate  (see  Traub  and  Woz'niakowaki  (1080)).  Using  this 
fact,  '.ve  can  easily  compute  the  orientation  of  the  surface  at  any 
point,  and  even  estimate  the  bending  energy  (however  since  this 
is  not  a  linear  functional,  it  it  not  necessarily  an  optimal  estimate). 
These  values  might  be  used  to  segment  the  image,  or  locate 
surface  discontinuties. 

A  second  advantage  of  the  spline  representation  it  that  t*  can 
easily  be  used  in  a  system  that  hat  a  focus  of  attention.  It  can 
easily  generate  liepth  values  at  any  points,  and  if  the  system 
decides  (after  lot  Vjng  at  some  initial  depth  values)  that  it  would 
like  to  look  at  a  portion  of  the  surface  in  more  dctzil,  there  is  no 
need  to  recalculate  the  spline,  simply  evaluate  the  spline  function 
at  the  new  desired  points.  Along  these  lines,  the  sysjem  can  also 
update  its  idea  of  a  surface,  by  adding  a  new  information  point, 
calculating  the  I'pdated  spline  coefficients  (this  coats  tally  ) 
and  updating  the  surface  points. 

A  third  advantage  of  this  -ep*v»eniation  is  that  it  is  less 
orientation  dependent  thar  a  gno  of  values.  If  the  visual  system 
were  to  rotate  or  translate  (as  long  as  it  does  not  change  the 
relative  o*dcr  and  spacing  of  the  information  points)  then  the 
coefficients  of  the  spline  are  the  same,  and  the  spline  is  simply 
rotated  or  translated  as  well.  This  property  does  nor  hold  for  a 
grid  of  data. 

A  final  advantage  is  that  this  representation  is  rmerally  more 
compact  (in  space  terms)  than  the  gnd  representation.  The  spline 
is  defined  by  3k  values,  these  are  the  k  coefficients  and  the 
location  of  the  k  information  points.  Given  these  values,  one  can 
reconstruct  the  spline  or  compare  this  sp'<ne  with  another  spline. 


This  could  then  be  used  :i  a  means  of  saving  the  reconstructed 
surface.  Also  given  advantage  three  above,  the  spline  represent!  - 
bon  might  be  used  in  a  surface  recognition  algorithm. 

§6  Extensibility  of  the  Interpolation  Methods. 

In  Grimson  (1981),  Grimson  argued,  rather  convincingly, 
that  the  correct  “unreuooablcness"  functional  was  quadratic  var  - 
iaiion.  He  however  considered  only  a  particular  form  of  func  - 
bonal,  and  there  may  be  more  appropriate  functionals  which  are 
not  of  the  farm  he  considered.  Also,  the  space  in  which  we 
attempt  to  minimize  the  functional  has  some  effect  on  the 
interpolating  surface,  and  we  should  consider  other  possible  spe  - 
ces  for  reconstruction,  even  for  die  quadratic  variation  functional. 

Inasmuch  u  the  reptoducing  kernel  method  finds  the  surface 
of  minimal  norm  from  a  Hilbert  space  (or  semi-norm  and  semi- 
Hilbert),  in  theory  it  can  be  applied  to  any  “unreasonableness" 
functional  which  is  the  norm  (semi-norm)  of  such  a  spare. 
However,  to  do  this  we  must  be  able  to  construct  the  reproducing 
kernel  of  said  space,  and  this  can  be  technically  very  difficult. 
Thus  we  can  easily  apply  it  only  in  those  situations  when  the 
reproducing  kernel  is  already  known.  Fortunately  there  are  a 
number  of  such  spaces  score  of  which  may  be  appropriate  for 
visual  surface  reconstruction,  see  Boult  (1985a).  In  fact,  a 
number  of  these  kernels  exist  for  higher  dimensions  allowing  the 
algorithm  to  be  extended  into  arbitrary  dimensions. 

Given  the  different  reproducing  kentel,  the  only  change  to  the 
algorithm  it  to  replace  all  evaluations  of  the  old  ker.,els  with  ’he 
appropriate  eva.waiions  of  the  new  kernel.  If  the  null  sp*».c  of 
this  new  space  is  different  from  that  of  the  spire  Fj,  then  we 

must  also  change  the  functions  p(x,y)  used  by  the  algorithm. 


To  extend  the  gradient  projection  based  algorithm  to  another 
functional,  one  would  first  have  10  develop  the  new  discretized 
version  of  the  functional.  Given  this  discretization,  one  would 
then  derive  how  io  compute  the  negative  of  the  gradient  function, 
and  the  parameter  cl  Note  that  if  the  functional  is  not  quadratic, 
these  last  two  modification  may  be  very  difficult  Modifying  the 
algorithm  to  compute  the  minimization  w  >  respect  to  another 
class  of  functions  is  not  even  possible  since  it  approximates  the 
surface  without  regard  to  a  space  of  functions.  It  would  also  be 
very  difficult  if  not  impossible  to  add  this  feature  into  the 
algorithm,  since  it  would  involve  verifying  that  the  surface 
produced  by  each  iteration  was  a  member  of  the  given  class  of 
functions. 


§7  Conclusions. 

In  this  paper  we  have  compared  two  different  algorithms  for 
visual  surface  interpolation.  And  with  the  possible  exception  of 
biological  feasibility,  we  found  that  if  the  information  wat 
sparse,  the  reproducing  kernel  rlgonthm  surpassed  the  gradient 
projection  based  algorithm  in  «lmost  every  important  algorithmic 
aspect.  If,  however,  the  number  of  information  points  was  com  • 
parable  to  the  number  of  points  at  which  we  ve  to  recover  the 
interpolation  surface,  then  the  gradient  projection  based  algorithm 
may  be  superior. 

Given  mat  the  problem  of  visual  interpolation  is  generally 
posed  as  one  -Ith  sparse  inforration.  we  believe  that  the  repro¬ 
ducing  kernel  method  will  be  a  superior  algorithm  for  computer 
vision  uses. 


§8  Acknowledgements. 

I  would  like  to  thank  David  Lee  and  John  Render,  both  of 
whom  played  major  roles  in  the  cunent  in  vesdgaton.  and  into  the 
application  of  the  reproducing  kernel  method  to  visual  surface 
interpolation. 


§9  References. 

Boult.  Terrance,  (1985a,.  Smoothness  Assumptions  in  Human 
and  Machine  Vision:  Their  Implications  for  Optimal  Surface 
Interpolation.  Columbia  University,  Computer  Science  De  - 
partmem  Technical  Report. 

Boult.  Terrance.  (1985b):  Reproducing  Kernels  for  Visual 
Surface  Interpolation,  Columbia  University  Computer  Sci  - 
cnee  Department  Technical  Report. 

Boult,  Terrance.  (1986):  Information  Based  Complexity. 
Applications  in  Nonlinear  equations  and  Computer  Vision, 
Doctoral  dissertation,  in  prepxrabon. 

Franke,  R„  and  Nielson,  G.  (1980):  Smooth  li.terpolabon  of 
Large  Sets  of  Scattered  Data,  In'ernational  Journal  for 
Numerical  Methods  in  Engineering,  Vul  15,  1691-1704. 

Franke.  R.  (1982):  Scartte._o  Interpolation:  Tests  of  Some 
Methods,  Mathematics  of  Computation,  38  <157,  181-200. 

Franke,  R.  (1984):  Thin  Plate  Splines  with  Tension,  to  appear 
CACD. 

Grimvon,  W.C.L.,  (198!):  From  Images  to  Surfaces:  A 
Computational  Study  of  the  Human  Early  Visual  System. 
MIT  Press,  Cambridge,  MA. 

Render,  John,  David  Lee  and  Terrance  Boult.  (1985): 
Information  Based  Complexity  Applied  to  the  2  1/2  D 
Sketch’,  Proceedings  of  the  Third  IEEE  Workshop  on 
Computer  Vision:  Representation  and  Control,  p!57-l67. 


Man,  David  and  Thantas  Poggio,  (197V;:  A  computational 
Theory  of  Human  Stereo  Vision,  Prcc.  R.  Soc.  Lond.  B 
204,  301-328. 

Meinguet,  Jean,  (1979a):  Multivariate  Interpolation  at  Arbitraiy 
Points  Made  Simple,  Journal  of  Applied  Mathematics  ana 
Physics  30,  292-304. 

Meingue%  Jean,  (1979b):  Basic  Mathematical  Aspects  of  Surface 
Spline  Interpolation,  .'SNM  45:  Numerische  Integration, 
21 1-220,  G.  Hammcrilin  ed.,  Basel-  Birkhauser  Verlag. 

Terzopoulos,  Demetri,  (1983):  Multi-level  Computational 
Processes  for  Visual  Surface  Reconstruction,  Computer 
Vision,  Graphics,  and  Image  Processing  24, 52-96. 

Traub,  J.T.  and  H.  Woz'niakowski,  (1980):  A  GrnercJ  Theory 
of  Optimal  Algorithms.  Academic  Press  NY. 

Traub,  J.V,  G.  WasiUcowski,  and  H.  Woz'niakowski,  (1983): 
Information,  Uncertainty  and  Comrlesdry  ,  Addis  an  Wesley. 
MA. 


(  /'O  ICM'b 


PREDICTING  SPECULAR  FEATURES 

Glenn  Healey  and  Thomas  O.  i3  in  ford 


Artificial  Intelligence  Laboratory 
Stanford  Univeraiiy 
Stanford,  California  04305 


Abstract 

We  show  that  highlight*  in  images  of  objects  with 
sp-  rnUirly  reflecting  surfaces  protnde  significant  infor¬ 
mation  uitoui  the  surfaces  which  generate  them.  A  brief 
surrey  is  given  of  specular  reflectance  modcis  which  have 
t»cn  used  in  romputc:  l’i.moii  and  graphus.  Fr,r  our 
work,  we  adopt  me  Torrance  Sparrow  secular  model 
'•*iifh,  unlike  most  previous  models ,  considers  the  un- 
drrh/ing  physics  of  specular  reflection  from  rough  sur¬ 
face  From  this  model  we  Jcriee  powerful  relationships 
btiuit  n  the  /jr,>/»rrfir*  of  a  sjwcutar  feature  in  an  i.nage 
and  local  pntjtrrtics  of  the  corresponding  surface.  Care¬ 
ful  *  fjwrimcnis  with  secularly  reflet  t:ng  objects  estab¬ 
lish  the  me nt  of  these  relationships. 


l.  Introduction 

Shiny  surfaces  give  us  specular  reflections  (high- 

'•)•  A  perfectly  sinoe.tli  shiny  surface  (eg.  a  per¬ 
fect  mirror)  reflects  light  only  in  the  direction  such  that 
tin*  angle  of  incidence  equals  I  lie  angle  of  reflection.  For 
rougher  -Jiiny  surfaces  (e  g  I  hr  surface  of  a  mrtal  fork), 
^|m-i  ul.ir  rlTri  is  .ire  nidi  observable.  In  tin*  paper  we 
analyze  i hr  properties  of  specular  rrllrrtion  from  rough 
slimy  surfaces. 

There  are  several  basic  reasons  why  the  study  of 
specular  reflection  deserves  serious  attention  «n  com- 
pii*rt  vi  ion.  Specular  features  are  almost  always  the 
bright esi  regions  in  an  image.  Cmilrast  is  often  very 
large  acro-.s  spec nlaril irs;  they  are  very  prominent.  Tbit 

m.  ikes  them  easy  to  locale.  In  addition,  the  presence 
or  absence  of  specular  features  provides  immediate  con¬ 
straints  on  the  positions  of  the  viewer  and  light  sources 
relative  t»*  the  specular  surface.  Also,  as  we  will  show, 
the  pinpet'iC'  of  a  specularity  constrain  the  local  shape 
..ad  ni  in.iai  mu  of  the  specular  surface. 

An  ahdity  to  rudersland  specular  features  is  valu¬ 
able  lor  any  vision  system  which  is  required  to  analyte 


images  of  shiny  objects.  This  work,  for  example,  began 
as  an  attempt  to  allow  ACRONYM  ;3|  to  reason  about 
specular  reflections  from  shiny  mechanical  part?  m  the 
ITA  project  |4|.  linages  of  these  parts  typically  contain 
large  spec  ular  regions.  The  recognition  task  becomes 
considerably  easier  if  the  system  is  able  to  predict  the 
characteristics  of  these  specular  regions 

In  this  paper  we  examine  what  information  can  be 
inferred  from  an  image  of  a  rough  shiny  surface  by  con¬ 
sidering  only  the  physics  of  specular  reflection.  Par¬ 
ticular  emphasis  is  placed  on  finding  r.ymb'dic  quasi- 
invari.iut  relationships  which  will  hold  in  many  different 
situations  fe.g.  diUcrcm  source,  viewer  configurations). 
In  contrast  to  many  intensity- based  vision  algorithms, 
our  relationships  arc  l».w»l  on  the  properties  of  a  rela¬ 
tively  large  number  of  pixels  in  an  image.  This  allows 
;is  to  ob.erve  predicted  features  ami  infer  local  surface 
»tia|>e  even  in  noisy  intensity  images  or  in  cases  where 
available  specular  models  do  not  com,  h’tely  cliaracter- 
if"  >!ir  physics  of  spcrular  rcllcction. 

2.  Review  of  Previous  Work 

Researchers  n  computer  graphics  have  used  in¬ 
creasingly  realistic  specular  models.  Several  of  these 
models  will  be  discussed  in  the  nfx'.  section.  In  com¬ 
puter  vision,  however,  relatively  few  attempts  have  been 
m.ulr  to  exploit  the  information  encoded  in  spccuiari- 
ties.  Ikeurhi  l)|  employs  the  photometric  stereo  method 
ly  and  use.i  a  distributed  light  source  to  determine  the 
orientation  of  patches  on  a  surface.  (*rmi:ion  [Hj  uses 
Fining’s  specular  model  ;t();  to  examine  spec nlaritics 
from  two  views  in  order  to  in  prove  the  performance  of 
surface  interpolation,  ('oleiiiau  and  Jam  T»‘  list  four- 
source  photometric  stereo  to  identify  and  rorrret  for 
spirillar  reflection  components.  In  more  recent  work, 
Make  I  .is mines  smooth  surfaces  and  single  point  spec- 
ulanties  to  derive  equations  to  infer  surface  shape  using 
specular  stereo,  lie  shows  that  the  same  equation*  can 
be  used  to  predict  the  appearance  of  a  specularity  on 
a  smooth  surface  when  using  a  di  ..ributed  light  source. 


f>  'u 


S\ 


\s 
*  % 
‘A 


L. 


<79 


Takai.  Kinnira.  and  Sata  13:  describe  a  'Model- based  vi¬ 
sion  system  wliich  recognise-'  objects  by  predicting  spec¬ 
ular  regions.  As  specular  models  and  insights  improve, 
we  expect  to  sec  more  work  which  makes  ".se  of  the 
properties  of  snecular  reflection. 


*  3.  Specular  Reflectance  Models 

Given  a  viewer,  a  surface  patch,  and  a  lighl  source, 
a  rellectanrc  model  quantifies  the  intensity  the  viewer 
will  perceive.  The  most  general  reflectance  models  rep- 
41  resent  the  perceived  intensity  1  as  a  suin  of  three  inde¬ 

pendent  reflection  components 

/  =  Ia  +  ‘a  t  is  (I)- 

Here  t  \  represents  the  ambient  reflection,  Ip  represent* 
dilfiisr  (l.anilicrtian)  relief  .ion.  and  /s  represents  spec¬ 
ular  reliction.  In  this  pape.  we  restrict  our  attention 
to  I  lie  / s  reflrrlioti  CiMiipenei 

We  n"l<  ilia,  it  is  typically  very  easy  to  separate 
the  Is  reflection  component  from  the  I  a  and  Ip  rellec 
ti»n  components  in  an  image.  There  are  two  distinctive 
properties  of  specular  reflection.  First,  over  most  of  a 
surface  /s-  is  iero,  but  in  specular  regions  Is  is  usually 
very  large  relative  to  I  a  and  Ip.  Secondly,  in  region* 
where  the  specular  component  is  .lomcro,  /.y  change* 
murli  more  rapidly  than  cither  i*  or  Ip. 

Ih-forc  discussing  the  various  specular  reflectance 
models,  we  introduce  the  reflection  geometry  (Figure 
1)  We  consider  a  viewer  looking  at  a  surface  point  P 
which  is  illuminated  by  a  point  light  source.  DeGne 

V  uni',  vector  from  I’  in  direction  of  viewe: 

■V  unit  surface  normal  at  P 

/.  unit  vector  from  I’  in  direction  of  „.;urce 

II  |  '  1  (unit  angular  bisector  of  V  and  L) 

,1  ros  '(.V  II)  (the  angle  between  .V  and  II) 

9  ros  1  ( .V  V )  (the  angle  between  .V  and  V) 


Figure  1  Tin  IMlertion  Geometry 


Throughout  this  paper,  we  consider  only  illumina¬ 
tion  front  a  single  point  light  source.  In  principle,  we 
lose  no  generality  using  this  kind  of  m  approach  since  we 
can  describe  distributed  light  sources  as  arrays  of  point 
sources  Thus  to  handle  situations  involving  distributed 
light  sources  we  only  need  to  integrate  the  clfects  of  an 
equivalent  .irray  of  point  sources. 

The  simplest  specular  mode!  assumes  t' H  specu- 
larities  only  occur  where  the  angle  between  L  and  N 
equals  the  angle  between  A  and  V  and  L ,  N ,  and  V  all 
lie  in  the  same  plane.  This  corresponds  to  the  situation 
a  =  0  in  Figure  1.  I'nlcss  the  surface  is  locally  flat,  this 
model  predicts  that  spec  lilac  it  irs  will  only  be  observed 
at  isolated  points  on  a  surf.tce.  A  few  experiments,  how¬ 
ever,  show  that  this  model  is  inadequate  for  most  real 
surfaces.  Not  only  are  observed  specular  features  usu¬ 
ally  larger  than  single  points,  but  highlights  often  occur 
m  places  which  are  not  predicted  by  this  model. 

An  empirical  model  for  specular  reflection  lias  been 
developed  by  Phong  ill)!  for  compute!  graphics.  This 
model  represents  tile  specular  component  of  reflection 
by  pi  ers  of  the  cosine  of  the  angle  between  the  perfect 
speem  direction  and  the  line  of  S'ght  Thus  Phong’s 
model  is  capable  of  predicting  specnlariiics  which  ex¬ 
tend  beyond  a  single  point.  Wli  le  Phong’s  model  gives 
a  reasonable  first  approximation  which  is  useful  in  many 
practical  situations,  it  is  possible  to  develop  more  accu¬ 
rate  models  by  examining  the  physics  underlying  spec¬ 
ular  reflecting. 

T’  :  Torrance-Sparrow  mode!  [Hi,  developed  by 
physicists,  is  a  more  refined  model  of  specular  reflec¬ 
tion.  This  model  assumes  that  a  surface  is  composed 
of  small,  randomly  oriented,  mirror-like  facets.  Only 
facets  with  a  normal  in  the  direction  of  II  contribute  to 
Is-  The  model  also  predicts  the  shadowing  and  masking 
of  facets  liy  adjacent  farri*  using  a  geometrical  attenu¬ 
ation  factor  "lie  resulting  specular  model  is 


FDC 

•  S  -- 

jV  V 

where 

V  Fresnel  coefficient 

I)  f.u  »•!  oi  M'tit.il  tor.  disl  rdmtion  fn n<,* i.>n 
^rnmriric.il  attenuation  factor 

Hr  will  an.dv/r  I  lie*  effects  of  each  factor  ill  the  model 
hi  the  next  few  paragr aphs.  'I'lie  result-.  we  present  in 
tins  paper  are  derived  f roin  (2). 

I  In-  I  resjiel  coefficient  I*  models  the  amount  of  light 
winch  is  rellec ti’il  from  individual  facets.  In  genera!,  F 
depends  mi  the  incidence  anplc  and  physical  properties 


*  V 

r*  '  e 


m.  ~ 

u 


tr  * 


400 


or  the  reflecting  surface.  Cook  and  Torrance  [6]  have 
shown  dial  to  ohlain  realistic  graphics  images,  F  must 
characterize  the  color  of  the  specularity.  For  metal  sur¬ 
faces  F  is  approximately  a  c  instant. 

The  distribution  function  D  describes  the  oi.cnla- 
tion  of  the  micro  facets  rcl  itivc  to  the  average  surface 
norm'd  ,V.  Rlinn  j2j  and  Cook  and  Torrance  [6]  dis¬ 
cuss  various  disti ilnition  functions.  In  agreeme  nt  with 
Tot  ranee  and  Sparrow  we  use  the  Gaussian  ilisl’.ibiiticn 
fiinc'.n.-t  given  by 


D  ~  (3) 


where  K  is  a  normalization  constant.  Thus  for  a  given 
q,  D  is  proportional  to  the  fraction  of  facets  oriented 
in  the  direction  H.  Tin  constant  m  indicates  surface 
roughness  and  is  proportion.il  to  the  standard  deviation 
of  the  Gauss'an.  Sinai,  values  of  nt  describe  smooth 
surfaces  for  which  most  of  the  specular  reflection  is  con¬ 
centrated  n  a  single  direction.  Large  values  of  m  are 
used  to  describe  rougher  surfaces  with  large  differences 
In  orientation  between  nearby  facets.  These  rough  sur¬ 
faces  produce  specularities  which  are  spread  out  on  the 
reflecting  surface. 

The  expression  for  the  geometrical  attenuation  fac¬ 
tor  G  is  derived  by  Torrance  and  Sparrow  in  [14].  They 
assume  that  each  specular  face*  makes  up  one  side  of  a 
symmetric  v-groovc  cavity.  From  this  assumption,  they 
examine  the  various  possible  facet  configurations  which 
jrrrspoiid  to  shadowing  or  masking.  They  quantify  the 
geometrical  attenuation  factor  as 


approximation  for  most  specularly  icflccting  surfaces. 
We  can  further  simplify  (2)  by  observing  that  except 
for  e  small  r.  -.ge  of  angles  -tear  grazing  incidence,  the 
value  of  G  is  unity.  Vic  will  discuss  this  result  and  the 
exceptional  cases  latte.  Hence  the  form  of  (2)  used  to 
determine  locai  surface  properties  is 

U,  —  (a/m  )* 

'—WT)-  (5)' 

Referring  again  to  the  geometry  of  Figure  1,  we  as 
suine  that  the  viewer  and  light  source  are  distant  relative 
to  the  surface.  Therefore  V  and  L  arc  essentially  con¬ 
stant  and  hence  their  angular  bisector  II  is  essentially 
constant.  We  assume  that  the  positions  of  the  viewer 
and  light  source  are  knov  .  Filially,  since  the  distance 
from  the  viewer  to  the  surface  is  large,  wc  can  approx¬ 
imate  the  perspective  projection  of  the  imaging  device 
with  an  orthographic  projection. 

Doubly  Curbed  Surfaces 

For  a  surface  which  is  doubly  curved  at  r  specu¬ 
larity  (i.e.  both  principal  curvatures  are  different  from 
zero)  wc  will  be  able  to  locate  a  single  point  P,t  of  max¬ 
imum  intensity  in  the  image  of  the.  spre ul.arity,  I  rani 
(a)  we  see  that  this  point  corresponds  to  the  lo<  aj  sur¬ 
face  orientation  Si  ---  II  (i.e.  a  -  0).  Given  a  doubly 
curved  surface  where  II  is  known,  we  can  very  quickly 
determine  the  surface  orientation  at  /V 

Figure  2  shows  a  lyoical  specular  image  generated 
about  mi  elliptic  point  on  .a  surface. 


|n  /  ,  *(*  ■  ")(*  ■  F)  2(jV  ■  fl)(Nj_L)  \ 
l’  (V  II)  ’  (V  If,  J  U' 


We  will  show  that  in  applications  it  is  often  possible  to 
use  a  .simple!  expression  for  G. 

As  S  increases  from  0  to  ',  the  viewer  gradually 
sees  a  larger  part  of  lb.’  re  Heeling  surface  in  a  unit  area 
in  the  view  plane.  Therefore,  as  0  gets  larger,  there  are 
correspondingly  more  surface  facets  which  contribute  to 
the  intensity  perceived  by  the  viewer  We  take  this  phe¬ 
nomenon  into  ai  tiiinit  in  (2)  I  dividing  by  Si  ■  V 

4.  Inferring  Surface  Properties 

lit  this  section,  wc  demonstrate  how  wc  can  use  (?) 
to  titter). line  local  suiface  properties  from  specularities. 
lit  aimed  all  situations  wc  do  not  riijui.c  tiir  full  gen¬ 
erality  of  (2)  to  inbr  these  local  properties.  Our  firs* 
assumption  is  that  F  is  cnnilaiit.  This  is  a  very  good 


Figure  2 


Specular  Intensities  on  a  doubly  curved  surface 


Tim  closed  curves  aic  image  curves  of  constant  intensity. 

corresponds  to  ir  -  0.  As  predict.il  1  jy  (a),  intensity 
dc' reuses  .is  we  move  aivay  from  New  suppose  we 
examine  a  >trai;ht  segment  S  in  llm  imaiie  which  inter¬ 
sects  /band  which  logins  and  tri  iiiinatrs  on  i  constant 


481 


intensity  curve  C.  Let  llic  measeicd  intensity  a*  P(1  be 
I, i  xud  let  the  measured  imcnsiiy  on  C  he  /[.  Since  ‘.ht 
eximnenti.il  factor  in  (5)  will  cii.mge  very  last  relative 
to  *  he  change  in  (  V  -  V'j,  we  can  consider  (S'  •  V)  to 
be  constant  on  S.  DcGiic  da  to  be  the  chui.;e  in  Or  as 
we  move  from  the  surface  pouit  imaging  to  /’,  to  the 

*  surface  point  imaging  onto  C  along  3. 

If  we  let  K‘  =  -  v^v-  ,  then  Iroin  (5)  we  have 

.  /„=*',  /,  -  (b) 

which  give  us 


da  =  m  y'  In  /„  -  In  / 1  (7) 

To  determine  the  surface  curvature  along  S  we  need 
to  compute  the  arc  length  of  ti  e  curve  on  the  surface 
wliirli  generated  S  in  the  image.  I'll  is  will  depend  on  V . 
Wi  do  this  hy  in  I  roihicing  in  x.y.x  coordinate  system 
such  that  the  surface  curve  S'  imaging  to  S  lies  in  the 
x-v  plane  (Figure  3).  Denote  hy  l\\  the  point  imaging 
Ij  and  place  at  the  origin.  Further  arrange  the 
coordinate  system  so  that  II  is  parallel  to  the  y  axis. 
Let  V'  i>e  the  projection  of  V  onto  the  x-y  plane. 

y 


T 


1‘igtirc  3.  I  lie  Surface  Geometry 

© 

0 

Since  the  specularity  will  only  be  observable  for  a 
Mimll  range  of  r>,  wc  can  approximate  the  arc  length  of 
S1  by  the  length  of  its  projection  onto  the  tangent  plane 
to  the  sill  face  at  /*'.  If  Is  denotes  the  ere  length  of  S' 
ami  lj  iiniotrs  tin*  length  of  S  in  the  image  then  Jelling 
O'  be  the  angle  between  l "  ami  tin*  y  axis  wc  have 


losing  (7)  ami  (8)  we  can  estimate  the  curvature  *c  on  S' 
at  P'  « 


Note  that  here  wc  arc  using  the  fart  that  since  the  length 
of  S'  is  small  I  he  surface  normal  at  any  point  on  S'  ,;es 
approximately  in  the  same  plane  as  S'  and  II  in  Figure 
3. 

For  any  segment  like  S  which  intersects  l\,  and  be¬ 
gins  anil  terminates  on  C  wc  can  compute  a  correspond¬ 
ing  value  of  x.  If  we  examine  these  line  segments  for  ev¬ 
ery  direction  in  the  image,  then  on  the  tangent  plane  to 
tlic  surface  at  we  will  he  examining  the  correspond¬ 
ing  line  segments  in  every  direction  through 

Coiiseipu  ully,  on  the  surface  wa  will  compute  k  for  all 
curves  formed  hy  intersecting  the  surf.-wc  with  planes 
containing  S'  and  I,  (for  i=l,...,n).  The  largest  and 
smallest  computed  values  of  k  will  give  the  principal 
curvatures  of  the  surface  at  P‘t  |l2j.  Therefore  eve  can 
determine  'lie  principle  curvatures  and  the  principle  di¬ 
rections  at  /(', . 

Singly  Curved  Surfaces 

If  one  principal  curvature  of  a  surface  is  sero  in 
a  specular  region  we  will  not  be  able  to  immediately 
infer  the  local  orientation  as  wc  did  for  a  doubly  curved 
surface.  To  understand  why,  consider  Figure  d.  Figure  d 
shows  a  viewer  looking  al^  tilted  cylinder.  To  make  the 
example  concrete,  assume  that  I,  is  such  that  II  —  V . 


Figure  d.  Viewer  Observing  a  Tilted  Cylinder 


For  ibis  configuration  there  will  he  no  point  on  the  sur¬ 
face  tor  which  a  0  (recall  that  II  is  essentially  con¬ 
stant),  vet  we  will  still  observe  a  .  per  nlaritv  in  the  Image 
if  at  some  point  it  is  small  enough  to  give  a  significant 
value  for  I in  (5)  Tnc  lines  of  constant  Is  in  the  im¬ 
age  appear  .is  in  Figure  5.  Here  «\;r:vsi)onds  tc  the 


482 


smallest  u  and  therefore  the  largest  specular  intensity. 
As  predicted  by  the  model,  /*  decreases  as  we  move 
away  from  C0. 


Figure  5.  Lines  of  Constant  />•  for  a  Cylinder 


Figure  6.  /••  for  dilferent  values  of  ^ 


We  observe  that  it  is  typically  easy  to  detect  the 
fact  that  a  surface  is  singly  curved  at  a  specularity.  This 
is  because  we  will  observe  a  line  of  maximum  intersity 
.along  the  line  of  icro  curvature}  instead  of  the  point 
maximum  we  observe  for  the  doubly  curved  case. 

Next  we  examine  how  we  can  infer  curvature  and 
orientation  from  an  image  of  a  singly  curved  surface  We 
cannot  simply  apply  the  analysis  for  the  doubly  curved 
case  since  in  general  we  do  not  know  a  at  surface  points 
on  the  line  of  maximum  specular  intensity.  In  Figure 
3  it  is  reasonable  to  assume  that  _■  —  0  at  /’y.  For 
the  singly  curved  case,  however,  the  rela  ionship  will  be 
more  complicated.  Modify  the  geometry  of  Figure  3  so 

that  II  lies  in  the  y-i  plane  and  m.  kes  an  angle  ^  with 
the  x-y  plane.  The  normal  to  S'  at  P,',  is  still  considered 
In  be  parallel  to  the  y  axis.  If  the  curvature  of  S'  at  Pq 
is  1/r  then  locally  we  have 


Planes 

For  a  planar  surface,  N  is  constant.  Hence  recalling 
our  basic  assumptions,  Is  ii  constant  across  a  plane.  If 
the  plane  is  oriented  such  that  a  is  small  enough,  then  a 
viewer  will  perceive  elevated  intensity  rcOccted  from  the 
plane.  As  with  the  singly  curved  surface,  the  magnitude 
of  the  perceived  intensity  will  depend  on  a.  If  a  is  not 
sufficiently  small,  then  Is  will  be  tcro  at  all  points  on 
the  plane.  These  observations  provide  us  with  tv'o  useful 
pieces  of  information: 

1.  Shiny  surfaces  which  don't  generate  spccularities 
over  a  range  of  ormtatin.is  arc  probably  planar. 

2.  Surfaces  which  produce  a  specularity  of  constant 
intensity  eve’  a  2-D  region  in  the  image  are  locally 
planar. 


*•  vV 


,0)  (10) 


ll  -  (0,  ros  •>,  sin  0)  (11) 


a(i)  =  eoi  ”  1  (  ---  s/ r7  x7)  (i2). 

This  expression  allows  us  to  predirt  the  appearance  of 
Is  on  a  singly  curved  surface  .vs  a  function  of  curvature. 
Figure  G  shows  how  I s  changes  for  a  rylindcr  of  fixed 
curvature  as  wc  change  <t>.  It  is  worth  noting  that  both 
the  magnitude  and  shape  of  change  as  <^>  increases. 


5.  Predicting  G 

In  the  previous  an. .lysis  wc  have  used  the.  fact  that 
over  most  viewer,  source,  surface  configurations  the  ge¬ 
ometrical  attenuation  f.ictor  G  of  (1)  will  have  the  con¬ 
stant  value  1.  For  large  angles  of  incidence,  however, 
the  character  of  G  changes  remarkably.  In  particular, 
for  I  wgc  ..ngles  of  incidence  (glancing  incidence)  we  sec 
that 

1.  G  can  become  as  large  .is  10. 

2.  G  causes  a  shift  in  the  peak  of  the  specular  profile 
low. ml  larger  angles  of  incidence. 

G  causes  the  specular  profile  to  lie  urisy mmetric  as 
a  function  of  a. 


483 


It  is  not,  surprising  that  wlicn  these  effects  arc  present  in 
■ut  image.  they  arc  rather  easy  to  detect.  For  this  reason, 
it  is  probably  profitable  to  make  qualitative  predictions 
about  G  in  applications  where  large  angles  of  incidence 
are  possible. 

0.  The  Laboratory  Setup 

A  laboratory  arrangement  has  been  set  up  to  test 
the  derived  relationships  (Figure  7).  In  this  section  of 
the  paper,  the  laboratory  setup  is  described.  In  Section 
7,  wr  share  insight  gained  about  how  to  best  use  the 
input  image  data  to  infer  surface  properties  Irom  spec 
ularilies  ir.  a  working  •  tent.  Experimental  results  art 
presented  in  Section  8. 


Figure  7.  The  Specularity  Lab 


To  insure  accurate  measurements,  the  experiments 
arc  conducted  on  a  lxn  foot  optical  table.  High  preci¬ 
sion  rotation  and  translation  stages  are  used  to  position 
the  objerts  being  viewed.  A  I'alogcn  light  source  with 
a  3  mm  wide  filament  is  placed  20  feet  from  the  object 
surface  to  approximate  a  point  source.  Monochromatic 
image  data  is  obtained  using  a  video  camera  and  an  Im¬ 
age  digitizer.  A  210  mm  lent  is  used  with  the  video 
camera  to  obtain  high  resolution  across  the  specularity. 
The  resuiting  images  .are  in  the  form  of  230x256  arrays  of 
pixels.  Each  pixel  has  eiuti.  hits  oT  gray  level  resolution. 
A  precise  positioning  device  has  been  built  to  position 
the  camera  relative  to  the  surface.  Gam  era- object  dis¬ 
tances  of  at  least  21  inches  arc  enforred  to  insure  that 
the  assumed  distant  object  condition  is  met.  Using  this 
setup,  it  is  possible  to  obtain  more  than  10  pixels  across 
a  specular  feature  widen  is  less  than  a  centimeter  Add¬ 
on  the  surface. 


Aluminum  cylinders  of  diameter  3.5,  2.5,  1.5,  and 
0."i5  inches  arc  used  to  test  the  predicted  relationships 
(Figure  8).  The  cylinders  have  been  carefully  machined 
to  achieve  uniformity  of  surface  roughness  on  individual 
cyliud  '«  '.inf  between  different  cylinders.  Tile  length  of 
each  cylinder  in  divided  into  \  sections  of  different  mea¬ 
sured  roughness.  From  this  we  are  able  to  sillily  tlie 
effect  of  varying  surface  roughness  on  images  of  spccu- 
laritics.  In  the  future  we  plan  to  experiment  with  dif¬ 
ferent  kinds  of  specular  surfaces  and  also  with  surfaces 
which  .are  doubly  curved. 


Figure  8.  Experimental  Specular  Surfaces 


7.  Interpreting  Real  Images 

In  this  section  we  discuss  practical  considerations 
related  to  using  the  results  from  Section  4  to  infer  sur¬ 
face  properties  from  real  images.  For  the  Erst  set  of  ex¬ 
periments,  each  cylinder  ;s  oriented  such  that  a  -  0  on 
the  liue  of  maximum  perceived  intensity.  For  this  spe¬ 
cial  ease,  (he  doubly  curved  surface  analysis  applies  to 
our  singly  curved  surfaces  (cylinders).  Figure  9  shows 
a  typical  image  obtained  using  this  configuration.  As 


Figure  9.  A  Specular  image 


484 


previously  discussed,  the  specul.v-ities  arc  easily  located 
in  an  image.  It  is  reasonable  to  assume  that  [IA  +  /D) 
is  constant  in  the  small  neighborhood  of  the  specularity. 
Thus  we  compute  is  by  subtracting  a  constant  from  the 
pixel  values  on  the  specularity  in  an  input  image.  Ig  is 
sci  to  zero  elsewhere.  Figure  10  shows  a  plot  of  /s  along 
a  horizontal  row  of  pixels  taken  from  Figure  9  after  the 
subtraction  of  (/ A  -t-  ID), 


to 

to 

f\ 

i  \ 

ft 

to 

./  \ 

■  /  \ 

M 

10 

0 

\_  : 

°  MS  to  M  *,  w 

meu 

Figure  10.  Plot  of  Is 


Computing  tl>o  Curvature 

Suppose  now  that  we  want  to  make  the  curvature 
computation  described  in  Section  1  on  the  horizontal  im¬ 
age  line  plotted  in  Figure  10.  The  problem  is  to  select 
an  appropriate  pixel  segment  S  in  this  image  row  which 
intersects  the  line  of  maximum  Is  and  begins  and  termi¬ 
nates  on  pixels  of  equal  specular  intensity.  An  equivalent 
problem  is  to  draw  a  horizontal  line  in  Figure  10  which 
intersects  the  /«,-  curve  at  2  points  of  equal  specular  in¬ 
tensity.  In  principle,  any  image  segment  which  satisfies 
the  stated  conditions  will  suffice.  In  practice,  however, 
sonic  choices  for  S  arc  better  than  others.  Figure  II 
shows  the  case  where  S  Is  chosen  *.»•  be  only  in  the  high 
intensity  part  of  the  Is  curve.  Fur  this  case  S  will  only 
be  a  few  pixels  long  ami  any  error  in  tin  measurement 
will  cause  a  large  relative  error  in  |.S’j.  Figure  12  depicts 
a  different  kind  of  problem.  Here  S  is  drawn  to  connect 
two  pixels  of  small  specular  intensity.  Now  S  is  many 
pixels  long,  but  choosing  the  tern-mating  points  is  diffi¬ 
cult,  if  nol  impossible.  This  is  because  Is  changes  only 
slightly  from  pixel  to  pixel  on  the  fringes  of  the  specu¬ 
lar  profile  Hence  for  this  case  there  will  probably  be  a 
large  spatial  range  of  pixels  which  arc  reasonable  termi¬ 
nating  points  Tor  8.  lienee  by  choosing  one  of  these,  we 
risk  intrndiiriiH'  a. large  error  in  |8'|. 


riXJLLi 

Figure  11.  S  Chosen  Too  High. 


Figure  12.  S  Chosen  Too  Low 


The  sclution,  of  course,  is  to  have  S  begin  and 
terminate  at  pixels  of  intermediate  specular  intensity. 
Since  Is  is  approximat  ly  Gaussian,  we  ran  determine 
a  Gaussian  fit  for  Is  and  compute  its  standard  devia¬ 
tion  a.  Let  n  be  the  pixel  of  maximum  specular  intensity 
(the  mean  of  the  Gaussian)  and  let  T  be  the  symmetric 
about  p  pixel  interval 

T  -  i/i -io./i -i- w]  (13) 

over  which  S  is  defined.  Consistent  with  the  heuristic 
arguments  given  .above,  experiments  have  shown  (see 


* 


* 


* 


V 


485 


Secl'oii  that  .in  accurate  curvature  compulation  will 
result  if  we  have 

r  <  v;  <  2 a  (14) 

Hy  imposing  .his  restriction  on  T,  we  guarantee 
that  wr  will  rho'.sc  an  S  which  avoids  either  of  the  pre- 
vi.mety  discussed  pitfalls.  We  in..',  improve  our  accu¬ 
racy  ny  purloiniing  the  compulations  of  Section  1  for 
se<'»  ,.|  different  S  segments  for  which  w  satisfies  (14). 
T  e  rurvature  k  ran  then  he  old. lined  hy  averaging  the 
cur  attires  computed  for  the  different  segments. 

Tiuncati?n  *  Tects 

Sine?  spec  llarilies  arc  usually  the  brightist  features 
in  'mages,  soer  liar  intensities  are  often  too  large  to  he 
.  ureseiiled  in  the  number  of  hits  per  pixel  allowed  by 
the  digitizing  '.adware.  If  this  is  the  c.vsc,  the  spec- 
■i.'ir'ty  is  said  to  lie  truncated,  Figme  tit  shows  I  for 
a  truncated  specularity.  The  obvious  way  to  deal  with 
this  sil  nation  is  to  avoid  it.  One  avoidance  technique 
is  to  lake  multiple  images  in  which  differing  amounts 
of  light  arc  allowed  to  pass  through  the  lens.  This  can 
he  achieved  cither  by  adjusting  the  lens  aperture  or  by 
using  filters.  Another  possible  solution  is  to  control  the 
illumination  to  eliminate  the  possibility  of  truncation. 


u  »  «o 


rtxiu 

Figure  13.  .  Truncated  Specularity 

If  inferences  nin.-t  be  made  from  a  single  image, 
then  it  is  arguably  belter  to  allow  trituration  to  occur 
In  lh‘-  <  ase  where  input  images  have  eight  bits  per  pixel, 
intensities  will  range  from  0  to  2ar>.  fn  many  applica¬ 
tions  it  is  possible  to  weaken  the  incident  illumination 
so  that  no  truncation  occurs.  In  doing  this,  however,  we 


cause  pixels  on  the  /«,-  curve  which  previously  had  sig¬ 
nificant  sperular  intensities  (on  the  truncated  specular 
feature)  to  have  negligible  specular  intensities.  The  net 
efTcct  of  eliminating  truncation  is  to  decrease  \he  width 
of  the  specular  feature  and  make  width  measurements 
more  susceptible  to  small  errors. 

8.  Experimental  Results 

The  technique  described  in  Section  4  has  been  ap¬ 
plied  to  the  task  of  determining  the  curvature  of  each 
of  tlie  four  cylindrical  surfaces.  Figure  14  shows  the  I 
profile  for  each  of  the  feur  surfaces  c(,  cj,  c3,  and 
taken  along  the  line  of  maximum  curvature.  Note  that 
we  allow  truncation.  Table  I  displays  the  results  of  the 
experiments.  For  these  measurements,  w  (see  Section 
7)  is  taken  to  be  1  85o  We  have  observed  that  for  the 
larger  (less  curved)  cylinders  cj  and  r,  itie  computed 
rurvature  varies  by  less  than  lOfic  .vs  we  let  w  vary  in 
(lie  interval  o  <  u;  <  la.  The  reason  for  tiiis  desir¬ 
able  behavior  is  the  fact  that  we  arc  able  to  measure 
many  pixels  across  specula>ilies  on  these  cylinders.  On 
the  oilier  hand,  if  we  attempt  to  measure  the  specular 
widths  for  the  smaller  cylinders  C!  and  c2  near  the  spec- 


a  >•  *• 


rvxrxM 

Figure  M.  I  for  Four  DifTrrcnt  Surface  Curvatures 


V\  ta  f*ixria 

» 

CumptoH  « 

£r*>r 

r  | 

24  3 

Ol'Ci 

0  25C5 

34'; 

r, 

17  7 

0  4H1C 

U  4101 

j.5ci 

r' 

31  4 

00711 

»»C3CS 

<*« 

C  i) 

1-3514 

i  ::s« 

9lr« 

Tal»l»*  I.  Curvature  Computations  for  the  Finn  Surfaces 


4  36 


ular  intensity  peaks,  it  is  possible  to  get  errors  larger 
limn  I Ti%  in  tlic  computed  curvature.  Tliis  is  because 
mar  tlicse  peaks,  we  arc  only  able  la  measure  ’  Vw 
pixels  across  tlic  speeularities.  Therefore  for  liijli  '  jt- 
vr.tnrc  surfaces,  it  is  advisable  to  measure  tlic  spe\  i  lar 
wi<l I lis  as  near  to  the  base  of  tlic  curve  as  possible  with¬ 
out  falling  victim  lo  the  problem  tlisetisscd  in  Section  7. 
As  is  eviileul  from  Table  I,  the  more  |.ixe'..  which  can  be 
in  astired  across  a  specular  feature,  the  more  accurately 
we  can  compute  the  curvature.  Ileucr  one  immediate 
way  to  improve  results  is  lo  digitize  higher  resolution 
images. 

An  interesting  observation  can  be  made  from  Fig¬ 
ure  H.  The  specular  model  (2)  described  in  Section  3, 
predicts  that  each  of  the  four  surfaces  should  generate 
the  same  maximum  value  for  f.s  when  cz  0.  This  pre¬ 
diction  is  intuitively  appealing,  since  it  seems  that  if  we 
examine  a  small  enough  patch  on  any  of  the  surfaces 
that  patch  should  he  approximately  planar.  Hut  a  cur¬ 
sory  glance  at  figure  13  seems  to  imply  that  a  highly 
curved  surface  produces  a  smaller  maximum  value  for  /$ 
than  a  Matter  surface.  The  model,  however,  is  correct. 
The  problem  is  that  for  the  highly  curved  surfaces  we 

are  unable  to  shrink  a  pixel  down  lo  where  the  surface 
area  it  images  is  approximately  planar.  Kven  within  the 
single  maximum  pixel,  a  is  changing  and  cannot  be  con¬ 
sidered  to  be  constant  zero.  Hence  the  in'ensity  value  U 
the  maximum  pixtl  will  be  some  kind  of  average  spec¬ 
ular  intensity  over  a  range  of  r»  and  will  not  give  us 
the  (rue  maximum  /  ., .  Thus  it  is  understandable  that 
maximum  measured  intensity  seems  to  increase  as  sur- 
fare  curvature  decreases.  It  follows  that  to  compute  the 
maximum  specular  intensity  in  applications,  we  should 
use  a  surface  of  small  curvature. 

0.  Summary  and  Implications 

Cudcr-tamling  specular  reflections  is  important  for 
any  computer  vision  system  which  must  interpret  im¬ 
ages  .if  shiny  objects,  (  sing  a  model  .leveloped  by  op¬ 
tics  researchers,  we  heve  shown  that  the  local  orientation 
and  principal  curvatures  of  a  specular  surface  can  be  de¬ 
termined  by  examining  image  inten/ities  mi  a  specular- 
ily.  I *u like  previous  work,  our  derivations  have  included 
tin-  ilfecl  i  of  surface  roughness  and  mierostrnrture  on 
the  appearance  of  speml.tr  features. 

A  laboratory  setap  has  been  de-cribrd  wine  li  allows 
us  to  test  our  theoretical  relationships.  Very  good  re- 
.oil-  have  been  arliieved  ile-pite  the  fact  lh.it  lie  high 
intensity  ami  small  >alial  extent  of  -p.-rutarilies  make 
ine.iMirrmenl.s  d.lliriih  Hr.  t  iical  issues  related  to  the 
mipleiticitlalion  of  our  analytical  re.-ulls  have  hern  dis¬ 
cussed.  These  issues  have  been  addressed  by  tbe  presen¬ 
tation  of  le-ted  methods  which  successfully  apply  our 


results  to  lac  problem  of  interpreting  real  images. 

The  ability  to  predict  intensity  patch  features  such 
as  speeularities  opens  up  interesting  possibilities  for 
model- based  vision.  Previous  model-based  vision  sys¬ 
tems  have  r.-slric'.ed  their  predictions  lo  tlic  shapes  of 
image  contours  which  will  be  observed  for  a  given  model. 
An  ability  to  predict  intensity  patch  featuics  will  signif¬ 
icantly  enhance  the  capabilities  of  a  model- based  vision 
system.  Clearly  it  h  advantageous  to  be  able  to  make 
stronger  predictions  .Uiont  an  image  by  using  additional 
information  about  the  imaging  process.  A  perhaps  more 
important  advantage  of  predicting  intensity  patches  is 

that  this  prediction  can  provide  strong  guidance  to  low 
level  intensity  based  visual  processes  such  as  edge  de¬ 
tection.  By  making  predictions  :aoot  the  appearance 
of  intensity  patch  features  we  can  hope  to  further  unify 
the  goals  of  the  low  level  and  high  level  mechanisms  of 
a  model  based  vision  system. 

10.  Future  Work 

One  plan  for  future  work  is  to  continue  conducting 
experiments  and  working  out  details  associated  with  in¬ 
ferring  local  surface  properties  from  speeularities.  In 
the  near  future  experiments  are  planned  to  study  the 
cfTecls  of  diderent  specular  materials,  different  kinds  of 
roughness,  and  different  surface  orientations.  We  also 
plan  to  experiment  with  doubly  curved  surfaces  and  to 
develop  a  model  which  predicts  the  combined  effects  of 
rough  surfaces  and  distributed  light  sources.  It  is  ex¬ 
pected  that  as  experinients  and  analysis  continue,  we 
will  be  able  to  develop  more  refined  algorithms  for  us¬ 
ing  speeularities  to  understand  images. 

Another  plan  for  future  work  is  to  loriniilate  a  gen¬ 
eral  framework  which  will  allow  the  ipialitative  predic¬ 
tion  of  ’.ne  structure  of  intensity  paldo-s  for  a  model. 
The  primary  challenge  will  he  to  isolate  singularities  in 
the  structure  of  the  intensity  palchrs.  As  an  example,  a 
singularity  occurs  when  we  rotate  a  singly  curved  object 
from  an  orientation  where  a  specularity  is  visible  to  an 
orientation  whrrr  the  specularity  disappears.  All  ability 
to  predict  the  structure  and  singularities  of  image  inten¬ 
sity  patches  will  open  up  significant  new  possibilities  for 
model  based  vision  systems. 

Acknowledgement* 

This  work  lias  bren  supported  by  an  NSK  gradu¬ 
ate  fellowship,  AI'OSII  contract  I-  1S,'?;;-H2  C -tkW2,  and 
Altl’A  contract  N000-39-H  l-C  1)21 1.  T  lie  authors  would 
like  to  (hank  Professor  llert  llesseliuk  for  generously 
providing  laboratory  space  ami  equipment.  We  w.mhl 
.•l-o  like  lo  thank  Kami  Hi-e  for  his  work  in  designing 


<187 


the  mechanical  parts  for  the  experiments. 


Reference* 

jl  Blake  A.:  "Specular  Stereo,"  Proceedings  of  UCAI-i 
(Los  Angelrs-  August  1085),  073-976. 

[2}  Blinn,  J.,  "Models  of  Light  Reflection  for  Com¬ 
puter  Synthesised  Pictures,"  Computer  Graphics ,  11(2) 
(1977).  192- 198. 

[3,  Brooks,  R.,  "Symbolic  Reasoning  Among  3-D  Models 
and  2-D  linages,"  Artificial  luielligence  ,  17  (1981),  285- 
348 

[4;  Chelberg,  D.  and  Lim,  H.  and  Cowan,  C., 
"ACRONYM  Model-based  Vision  in  the  Intelligent 
Task  Automation  Project,"  Proceeding*  of  Image  Un¬ 
derstanding  Workshop  (1984). 

[5]  Coleman,  E.N.  Jr.  and  Jain,  R.,  "Obtaining  3-D 
Shape  of  textured  and  Specular  Surfaces  Using  Four- 
Source  Photometry,"  Computer  Graphics  and  Image 
Processing,  18  (1982),  309-328. 

jC,  Cook.  R.  and  Torrance,  K.,  "A  Reflectance  Model  foe 
Computer  Craphics”,  Computer  Graphics,  15(3)  (1981), 
307 -3 1C. 

J7'  Foley,  J.  and  Von  Dam,  A.,  Fundamentals  of  Inter¬ 
active  Computer  Crap  hies  ,  Addison- Wesley,  1982. 

|8!  Crimson,  W.K.I,.,  "Binocu  ar  Shading  and  Visual 
Surf.we  Reconstruction,"  MIT  A I  Memo  C97  (1982). 

!9  Ikriielii.  K.,  "Determining  Surface  Orientations  of 
Specular  Surfaces  by  Using  tli. *  Photometric  Stereo 
Method",  tKblK  PA  Ml  3(C)  (1981),  CCl  669. 

Iio!  Phong,  H.,  " Illumination  for  Computer  Generated 
P:rtures,"  Communications  of  the  ACM  18  (1975),  311- 
317. 

[II!  Shafer,  S.,  "Optic. d  IMiennmena  in  Cooiputer  Vi¬ 
sion,"  Univ.  of  Rochester  YK  135  (1984). 

i  12)  Spivak,  M.  Differential  Geometry,  Publish  of  Per¬ 
ish,  Inc.,  1979. 

13;  T.ukai,  K.  and  Kimnra,  K.  an<l  Sato,  T.,  "A  Fast 
Visual  Recognition  System  of  Me,  li.micaj  Parts  by  Use 
of  Three  Dimensional  Model,"  source  unknown.  (First 
author  is  with  (’ANON  INC.  in  Tokyo.) 

II  Torrance,  K.  and  Sparrow,  E.,  "Theory  for  Olf- 
Specular  ilellection  from  Rougher.  *<l  Surfaces,"  Journal 
of  the  Optical  Society  of  Imerit/i ,  57  { 1967),  1105-1114. 
I">  Wooilham,  It.,  "Photoinctrie  Stereo:  A  Reflectance 
Map  fecliniipic  for  Determining  Surface  Orientation 
1  rout  Im.-je  Intensity,”  Proc.  SPlE,  vol.  155  (1978). 


QCQ  70/45  Si 


A  Provably  Convergent  AJgorithm  for  Shape  from  Shading 


David  Lee1 


AT&T  Beli  Laboratories 
600  Mountain  Ave 
Murray  Hill,  NJ  07974 


Abstract 

The  problem  of  shape  from  »bfii|  aad  occluding 
boundaries  is  reduced  to  solving  s  system  of  non-linear 
equations  by  using  tbe  smoothing -spline.  We  present  aa 
iterative  algorithm  for  solving  this  system  of  equations.  We 
prove  that  the  algorithm  converges,  and  we  analyte  its 
complexity.  We  show  the  existence  and  uniqueness  of  the 
smoothing-spline  and  the  solution  of  the  system  of  equations. 

1.  Introduction 

Research  in  shape  from  shading  explores  the 
relationship  between  imepe  hnfhtnese  and  obi.ct  shape.  A 
great  deal  of  information  is  contained  in  the  image 
brightness  values,  since  image  brightness  is  related  to  surface 
orientation.  Algorithms,  designed  to  determine  shape  from 
shading,  include  characteristic  strip  expansion 
[1,  2,  3,  4,  13,  14],  pkotometric  stereo  [3,  8.  13,  IS],  and 
nnmeriee/  shape  from  « /lading  and  occluding  boundaries 

w 

This  work  is  motivated  by  the  interesting  paper  by 
Ikeuchi  and  Horn,  [7|  That  paper  addresses  the  problem  of 
numerical  shape  from  shading  and  occluding  boundaries,  and 
results  in  a  large  system  of  nonlinear  equations.  An  iterative 
algorithm  is  proposed  for  solving  it.  However,  the  existence 
and  uniqueness  of  the  solution  of  the  system  remain  a 
problem,  end  tbe  convergence  of  the  iterative  method  baa 
not  been  established. 

We  will  report  our  preliminary  work  in  progress  on 
this  problem.  Wr  *  ropooe  ising  a  diffeient  iterative 
algorithm  for  solving  tbe  system  of  nonlinear  equations 
derived  in  [7],  and  discuss  its  convergence.  We  study  the 

lTbi*  wort  wu  doa*  wt*a  tbe  tetbor  *u  it  Cola  mb'  *  tJahremtv.  maooftad 
is  pan  by  ta  IBM  (Tsdaata  felkayhip,  ta  pan  by  NSP  Oraat  DCR-S2-lii?2,  aad 
ia  pan  by  DARPA  Qraal  NOOOfSOSeSISS. 


existence  aad  the  uniqueness  of  the  solution  of  the  jystem  ad 
well. 

In  Section  2,  we  deer- -be  the  work  by  Ikeuchi  and 
Horn  briefly.  In  Section  i,  we  stody  n  different  iterative 
algorithm  foi  solving  the  system  of  equations.  In  Section  4, 
we  discuss  the  convergence  of  tht  iterative  algorithm,  and 
the  existence  and  the  uniqueness  of  ths  solution  of  ths 
system  of  equations  involved.  An  example  for  the  case  of  n 
Lambertian  surface  a  given  in  Section  S.  In  Section  8,  we 
study  tbs  complexity  of  our  algorithm.  A  different 
approach,  from  the  point  of  view  of  the  general  theory  of 
information-based  complexity,  is  discussed  briefly  in  Section 
7.  We  conclude  this  paper  by  pointing  out  ths  limitations 
of  ibis  work  aad  new  directions  of  research. 

2.  Numerical  Shape  from  Shading  and 
Occluding  Boundaries 

In  this  section,  we  summarise  ths  relevant  part  of  [7]; 
ths  readers  art  referred  to  tbe  original  paper  by  Ikeuchi  and 
Horn  for  details. 

The  goal  of  numerical  shape  from  shading  and 
occluding  boundaries  is  to  determine  surface  orientations 
from  image  orightness  and  boundary  conditions.  We  discuss 
the  representation  of  surface  orientations  first,  and  then  the 
algorithm  proposed  by  Ikeuchi  and  Horn. 

2.1.  Gaussian  Sphere  and  Stereographic  Projection 

Surface  orientation  is  quantified  by  the  sur.'aci  normal, 
a  unit  vector  in  R1.  A  surface  normal  can  tv.  represented 
by  a  point  on  a  unit  sphere,  called  the  Gaussian  sphere. 
The  part  of  the  surface  facing  us  corresponds  to  the 
northern  hemisphere,  while  points  on  the  occluding 
boundaries  correspond  to  the  points  on  the  equator. 


489 


The  northern  hemisphere  is  then  projected  into  s 
plane,  the  (-1}  plane,  which  is  tangent  to  the  sphere  a.  the 
north  pole.  The  projection  center  is  the  south  pole.  This  is 
called  afei  eographic  projection.  This  is  a  conformal 
mapping,  and  the  northern  hemisphere  b  mapped  onto  a 
closed  disc  of  radius  2  in  the  £-ij  plane.  Therefore,  points  in 
hb  disc  represent  the  surface  orientations.  Notice  that 
orientat'ons  of  the  occluding  boundaries  correspond  to  the 
points  on  the  circumference  of  that  disc. 

2.2.  Image-irradiance  Equation  and  Boundary 
Conditions 

The  surface  orientations  are  related  to  the  image 
brightness  by  the  following  image-irradiance  equation, 

K((,i?)  -  E(x,y),  (x.y)  €  D.  (1) 

where  D  is  a  unit  cquare  region,  (  —  £[n,y)  and  ij  «■ 
i^x.y)  represent  the  surface  orientation,  E(x,y)  €  C  is  the 
brightness  measured  at  the  point  (x,y),  and  ,  •)  €  C1 
can  be  determined  experimentally  or  theoretically  if  some 
information  is  available  about  the  incident,  ermttance  and 
phase  angles;  see  [9,  9|.  The  image-irradiance  equation 
provides  information  for  determining  the  surface  orientation 
from  image  brightneas. 

From  Equation  1  alone,  one  cannot  determine  the 
surface  orientation  (In)  at  each  point  (x.y).  We  need 
supplementary  information  from  boundary  conditions.  The 
outline  of  the  projection  of  an  object  in  the  image  plane  b 
called  its  silhouette.  Some  parts  of  it  may  correspond  to 
sharp  edges  on  the  surface,  and  some  parts  to  places  where 
the  surface  curves  around  smoothly.  The  smooth  parts  of 
the  surface  correspond  to  the  parts  of  the  silhouette,  called 
oceluding  boundaries,  which  supply  important  information 
about  the  shape  of  an  object. 

Without  loss  of  generality,  we  assume  that  R 
E(x,y)  >  0,  and  that  E(x,y)  «•  0  iff  (x,y)  belongs  to  the 
occluding  boundaries  iff  f^x.y)  +  q^x.y)  «•  4  iff 
R(f(x,y)rfl(x.y))  “*  0.  We  further  assume  that  the  surface 

orientations  on  the  boundaries  of  D  are  known. 


2.3.  Consistency  Constraint 

We  assume  that  thi  surface  we  perceive  is  smooth. 
More  specifically,  we  assume  that  the  first  order  prrtial 
derivatives  of  4(xjr)  and  tj(xj)  are  square  integrable.  We 
also  assume  that  reai  world  surfaces  tend  to  be  stable,  and 
the  stability  b  meacu'ed  by 

dxdj.  (2) 
J  D 

where  tiv  and  ij7  denote  the  partial  derivatives 

of  (  and  if  with  respect  to  x  and  y,  respectively.  The 
surface  consistency  constraint  a  quantified  aa  minimising 
v. 

Thus,  observing  t-e  image-irradiance  Equation  1  and 
boundary  information,  we  are  seeking  functions  ffx.y)  and 
n(x,y),  which  tend  to  minimise  v  in  Equation  2.  An 
approach  used  in  [7]  b  spline-smoothing  [8],  It  finds  (  and 
1}  which  minimise 

t*  —  Kin)  “  (3) 

J  {  Kf.)*  +  (Zy*  +  ( n ,?  +  (n,n  + 

min)  -  E(*,y)P  }  dxdy. 

where  the  penalty  part-.nttcr  X  b  set  according  to  the 
accuracy  of  the  measurement  of  the  image  brightness  and 
the  preciseness  of  the  modeling  of  ttte  lighting  environment 
by  R.  The  noisier  the  measurement  and  the  less  precise  the 
modeling,  the  smaller  the  parameter  X.  For  example,  in  [7], 
X  b  set,  heurirtically,  in  inverse  proportion  to  the  root- 
mean-square  of  the  noise  in  the  image  brightness 
measurements. 

2.4.  An  Iterative  Algorithm 

In  the  previous  dbcussiou,  we  described  the  image- 
irradiance  equation,  boundary  conditions,  the  smoothness  and 
consistency  constraint,  and  arrived  at  spline-smoothing.  All 
quantities  involved  are  continuous  ft  nctions. 

Wc  cow  discretize  the  unit  square  region  D  in  the  xy- 
plane  with  mesh  size  b,  and  discretize  p  by  using  difference 
operators  instead  cf  differential  operators,  and  summations 
instead  of  integrals.  The  corresponding  discrete  smoothing- 


4  ro 


splice,  or  DSS  for  short,  minimites 


/*  “  (*y  +  X  ri>  (<) 

>-i 

where 

5U  “ 

K$+,j  -  +  Ky+i  -  Gj* 

+  (’Ji+u  -  1J2  +  fay+i  -  ■ 

rU  ”  [R(£ij  >  Vij)  *  EjjPt 

and  where  (y  and  riy  represent  the  surf  see  orientstioi  st 
the  regular  grid  point  (ihjh),  and  Ey  is  the  brightness 
measured  at  the  grid  point  (ih,jh).  The  above  minimisation 
is  subject  to  the  boundary  constraints,  i.e.,  fy  and  l*y  art 
known  if  (ihjh)  belongs  to  the  boundaries. 

To  Minimize  f  it  Equation  4,  we  have  to  solve  a  large 
system  of  sparse  cod  linear  equations: 

f.J  -  f.y  -  4  'Xh’tR(fy,qy)  -  Eyjam^y)^, 

fij  -  ly"  -  4->Xh’[R((u.,y)  -  ^laP^Cu.tfy)^. 

where 

fij  “  ffi+ij  +  £'j+i  +  +  *U-,l/<. 

1ij‘  “  +  *7ij+i  + 

To  solve  Equation  »  for  fy  and  ijy  ,  Ikeuchi  and  Horn  [7| 
proposed  the  following  iterative  algorithm: 

t.  .(“+>)  — 

(y'(”L4-'Xb3[R((u(»^y(“)).EJ«(ei>j(-),7u(-"l)^. 

ft.  .(•>*♦*)  S 

"  J 

7ij-<"l-4-'Xh=[fVfyO">,.jy<™>).EuW^J(-),7ij'-l)>». 

o 

We  can  repeatedly  use  the  vjues  from  the  mth  iteration  on 
the  right-hand  side  to  compute  the  values  for  the  (m+I} it 
iteration  on  the  left-hand  side.  The  initial  values  are 
supplied  by  the  boundary  conditions,  i.e.,  (y  and  fjy  are 
known  if  ( ih.jh)  belongs  to  the  bound  vies. 

The  existence  and  uniqueness  of  the  solution  remain  a 
problem,  and  the  convergence  of  the  iterative  method  has 
not  been  established.  Furthermore,  Equation  S  is  a 
necessary  conditions  for  minimizing  /<  in  Equation  4,  with 


the  additional  constraint  that  in  a  DSS,  (y**  +  t) y,s  <  4, 
if  (ihjh)  does  not  belong  to  the  boundary  points. 


3.  A  N*w  iterative  Algorithm 

!c  this  section,  we  study  a  different  iterative  algorithm 
for  solving  Equation  S,  which  for  a  range  of  X  converges  to 
the  unique  solution  of  the  system:  the  unique  DSS 

minimising  ft  in  Equation  4.  For  an  arbitrary  X,  the 
uniqueness  of  the  solution  and  the  convergence  of  the 
algorithm  need  further  study. 

We  First  discuss  a  matrix  which  ia  related  to  the 
algorithm,  atd  then  we  present  the  algorithm. 


3.1.  Matrix  A 

Let  K  +  !  •  I'1  and  N  •*  K*.  In  the  rest  of  this 
part,  we  will  deal  with  an  NxN  matrix 


/  a  -r 

I  -I  B  -I 


A  — 


where  the  KxK  matrix 


B  - 


\ 


\ 


-I  B  -I 
-1  B 


(7) 


(») 


Wt  slat*  a  few  facts  about  matrix  A  below;  for  the 
details,  see  [1 1 1. 


Matrix  A  is  symmetric  and  positive  definite,  with 
eigenvalues  {Xy),  ij  —  1,  K,  where 

*»  rj 

Xy  —  4|  st's* - +  sis1 - |.  (9) 

2(K+1)  2(K+1) 

The  inverse  of  A,  A"1,  a  also  symmetric  and  p-jsitive 
definite,  with  eigenvalues  (/ly),  i,j  «■  1,  •  ,  K,  where 

•'  ('<» 


A'1  can  be  decomposed  as 

A'1  -  IL11I,  (  1) 

where  the  diagonal  matrix 


401 


••  V 


O 


Bhu 


A  —  diag{/»y),  ij  —  1,  K, 

(12) 

% 

and  H  is  the  tensor  product 

H  -  S*S, 

(13) 

£ 

where  the  (i.j)th  »ntry  of  the  KxK  matrix  S  is 

> 

fc 

Sy  —  [2/(K  ->■  l)]1/*  »i'n(rij/(K  +  l)]. 

(H) 

rv 

a 

Multiplying  A'1  by  a  vector  costs  0(N2), 

using  the 

t  .  * 


■ 


B 


E 


conventional  method.  Since  we  cu  deetvopose  A'1  — 

S6SASCS  uid  A  is  a  diagonal  matrix,  taking  advantage  of 
the  structure  of  the  entries  ol  S,  ty  ,  we  can  use  Fast 
Fourier  Transform*  (FFT)  for  he  multiplication,  which  cosu 
0(N(/cjN)). 

3.2.  A  New  Iterative  Algorithm 

Equation  5  can  be  rewritten  as 


Mx  —  -  Xh’b(v), 


where 

-q 


where  A  is  given  in  Equation  7,  and 

b  - 

I  Wfy  .  Oy)  -  By}  «*R<£y  ,  -7y)/  dt 

(R((y  •  fy»  •  Ey><?R(fy  .  fy)/d«.  -  ]T. 

and 


(»S) 


(18) 


(»7) 


(18) 


I  fl.l  •  f.JC  • 
fi.i  ■  ">  7i,k  •  "•  7kjc  1T 

Since  A  is  non-degenerate,  M  is  also  »o«-degenerate, 
and  therefore,  Equation  IS  is  equivalent  to 

x  -  -  Xb*M-'b(x),  (13) 

We  propose  using  the  following  iterative  algorithm 
x,m+»  -  .  XhIM'b(x<"l).  (20) 


where  xf°l  is  an  arbitrary  initial  element. 

We  discuss  the  convergence  ol  the  iterative  algorithm 
in  Equation  20,  the  existence  and  the  uniqueness  of  the 
solution  of  Equation  IS,  and  the  existence  and  the 
uniqueness  of  the  DSS,  which  minimises  fi  in  Equation  4,  in 
the  next  section. 

o 

4.  Convergence  of  the  Algorithm 

Our  goal  is  to  find  a  OSS  x*  —  [  f,  ,*  ,  -,  (liK*  ,  -, 

fpcjc*  ■  7U*  .  7,jc*  -  *Kjt*  I*  whieh  f*  » 

Equation  4,  subject  to  the  boundary  constraints,  i.e.,  the 
surface  orients. ions  arc  known  on  boundaries.  We  call  a 
DSS  rcpxfar,  if  (fy*)1  +  (ly*)*  <  d.  when  fihjh)  is  not  a 
boundary  point.  A  regular  DSS  dors  not  generate  false 
occluding  boundary  points. 

Since  ((y  ,ify)  is  in  the  closed  disc  with  radius 
denoted  by  S,  x  is  defined  on  a  compact  set  in  R®1*,  S®*. 
Since  p  in  Equation  4  is  a  continuous  function  of  x,  it 
obtains  its  minimum  on  S*^,  and  therefore,  DSS  exists.  We 
first  show  that  a  DSS  a  regular,  and  thus  a  regular  DSS 
exists.  We  then  show  that  the  DSS  is  unique  and  that 
algorithm  in  Equation  20  converges  to  this  unique  DSS. 

We  assume  that  there  exist  at  least  two  boundary 
p  iinu,  on  which  the  surface  orientations  are  different. 

To  prove  that  a  DSS  is  regular,  we  need 

Lemma  4.1  Let  P,  be  a  point  on  the  circumference 

of  a  disc,  and  let  Pj  yfc  P0  ,  i  —  1,2 k,  be  point,  in  the 

disc.  Then  for  6  >  0  and  K  >  0,  there  rxista  P#*  in  the 
disc,  such  that  (i)  d(P0  ,  P,*)  <  6,  and  (ii) 


£  « w  > 


(21) 


J2  d (P,\PJ*  +  KdfPj.P,*)*, 

where  d(P,Q)  is  the  Euclidean  distance  between  I*  and 


Q  0 


t, 

r 


492 


Proof.  Denote  the  angle  scanned  by  the  vectors  P<jP, 
and  PjPj  as  <Pi,P0,Pi>.  Let  a  —  max  {<Pi,P0,Pj>,  ij 

*»  i,2 It}.  Let  Q’P0Q  be  the  bisector  of  a.  Since  P;  is 

in  the  di3c,  a  <  x,  <PiPF0,Q>  <  x/2,  and  <Pj,P0,Q’> 
—  x  -  <P;,?0,Q>  >  x/2.  Let  P0*  be  a  point  on  P0Q 
such  ibat  d(P0,P0*)  <  S.  Then  in  the  triangle  P^'Pj, 
d(p0,p^  -  dfp^.p.)2  +  d(p0,pp*)*  -  2d(p0*,pi)  qp„p0») 
eos< P#,P#*,P;>.  Since  fim^.  _  <P#1P0*,P.>  — 

<P;.P0,Q’>  >  x/2,  and  /im^.  _  PjP,*  —  P;P^  then 
for  sufficiently  small  d(P0.P0'),  it  is  seen  that  -2d(P#*,Pj) 
cos<P0P#*,Pi>  >  Kd(P#,P9‘).  Therefore,  ^P,*,^) 
d(P0,P0‘)  eo»<P9,P9*,P (>  >  Kd(P0,P0*)J,  and  qPa.Pf  > 
d(P0*,P|)J  +  Kd(P9,P9*)*.  Taking  summation  over  all  i,  we 
have  Inequality  21.  Q 

We  are  ready  to  prove 

Lemma  4.2  A  DSS  is  regular.  [] 

Proof.  We  prove  by  contradiction.  We  ijin:,  on  the 
contrary,  that  there  exists  a  DSS  x*,  which  is  not  regular, 

i.e., 

P(x')  -  £  (Sy*  +  X  ry‘).  (22) 

>xj 

where 

5u*  - 

(7*i/*7ij*)*  +  (ij+S-’hj'fl/V  • 

-  Wfi/  ,  Ty*)  -  E.J*. 

and  there  exists  (£y*  ,  7y*),  such  that  (y**  +  7y**  “  4. 
Since  (ih.jh)  L-  an  interior  point  of  the  region  D,  Ey  0. 

We  have 

p(x*)  -  (23) 

F(x*)+X[R(5y*,7y*)  ■  Ey*  + 

<[(*♦.  j*  •  -  V^+ 

[(£,,♦,*  -  fi /P-H'/y*,*  '  7u*R> 

KW  •  (iff+tow*  -  7./H+ 

[(£,>,*  -  £.j‘)5  +  (7ij.,*  -  Py./yD/h* 


where  F(x*)  doe  not  contain  f;J*  and  7y*. 


Let  d(k,l;i,j)  be  the  Euclidean 

distance 

between 

(£kj*  .  7k,i*)  and  (fy*  ,  7y*).  Then 

P(x*)  -  F(x*)  +  X[R(£y*  ,  7,  j  * } 

(24) 

[d=(i+l,i;i,j)  +  d^'J+lji.j)  + 
<*2(i-l.j;i.j)  +  d*(i ,  j- 1  ;i,  j  )]/h*. 


Siuce  Ey  >  0,  R(fy*  ,  7y*)  —  0  and  R  is 
continuous,  there  exists  a  disc,  centered  at  (fy*  ,  ijy*)  with 
sufficiently  small  radius  6,  such  that  all  ((,7)  inside  the 
disc,  m.n)  -  EJ*  <  [RKy«  ,  7y*)  -  E.J*  -  Ey*  . 

Let  P  -  {(i+l.j),  (i-l,j),  (i,j+l),  (i.j-1)},  and  let  A  - 
{(k,l)  €  P:  d(k,l;i,j)  >  0}  and  K  —  |P  -  A|.  Wt  analyte 
the  following  two  cases,  and  arrive  at  a  contradiction  for 
each  case. 


Cue  1.  K  <  4,  i.e.,  there  exists  at  least  one  (k.l), 
such  that  d(k,l;i,j)  >  0.  By  applying  Lemma  4.1,  with  P0  <— 
(£y*  .  7y*)  “d  {Pj}  —  A,  we  know  that  there  exists 
(£*.7*).  'aside  the  disc,  centered  at  (fy*  ,  7y*),  with  radius 
6,  such  that 


Sd(k.l;i.jy*  -  T  d(k.l;i.ij* 

p  {TT)6a 

>  52  d*(k,l;0,0)'  +  Kd«(i,j;0,0)2 

—  52  d*(k,l;0,0)*  +  52  d«(k,I;C,0)X 

(b^p-A 


-  52  d*(k,l;0,0)a, 
tTiJfep 


where  d‘(k, 1,0,0)  is  the  Euclidean  distance  between 


(£k.,*. 


7kJ‘)  and  (r,7‘) 


Replacing  (gy*  ,  7y*)  by  (C. 7*).  from  Equation  24, 
we  htve  x«,  such  that  (li*4*)  <  Therefore,  x*  is 

net  \  DSS,  a  contradiction. 


*  1 

i  k  * 

Ui 


JL  : 


k 


f  * 


* 

> 

r 

a 


433 


converges  lo  X  . 


Cue  a.  K  -  4,  i.e,  d(k,l,i,j)  -  0,  V  k,l  6  P.  We 
separate  out  all  'he  grid  points,  adjacent  to  any  of  the 
points  in  P.  We  denote  this  set  of  grid  points  by  P*1*.  We 
analyte  P<1),  in  a  similar  way  as  P,  and  we  will  arrive 
.-ither  at  a  contradiction  or  at  the  conclusion  that  d(k,l;i,j) 

-  o,  V  (k,i)  e  p  U  p*‘>,  i.e..  (fw*  ,  7*/)  -  (C,7*).  * 

(k,I)  6  P  U  pR*.  We  repeat  tin.  same  arguments  as  we 
expand  the  region  of  grid  points  with  identical  ({*,?*).  Since 
D  is  a  unit  square  region  and  there  exist  at  least  two 
boundary  points  on  which  the  (fixed)  surface  orientations  are 
different,  we  will  arrive  at  a  contradiction  no  later  than 
that  the  expanded  region  coven  these  two  points. 

Therefore,  an  irregular  DSS  does  not  exist.  This 
completes  the  proof.  |] 

We  assume  that  {R(f,i j)  -  Ey)  <JR(<,r>)/>  (  and 
{Rff.if)  -  Ey)dR (?,n)A*l  »re  Uptchii^  functions,  i.e., 

<  LOyUf  •  €?  +  (7  -  7?),/J 

and  (2S) 

|{R(efl)-Ey)Ri  ,n )tn  -  {R(f.7>Eyj*<c,7')*»| 

<  L<=>y«f  -  O2  +  (If  - 

Let  max  {Llj(l,,}ijj  —  •  Then  we  have 

Theorem  4.1  For  X  €  [0  ,  Z**i'e',|l.x*h*/^]:)-  *l>er* 
exists  a  unique  DSS  minimising  ft  in  Fquation  4,  which  >s 
also  the  unique  solution  of  Equation  IS,  and  the  algorithm 
in  Equation  20  converges  to  that  DSS.  [| 

Proof.  Since  DSS  exists  and  is  regular  Fquation  IS  is 
a  necessary  condition  of  a  DSS.  Let  x'be  a  regular  DSS. 
Then  x*  “  -  Xh-M'^x*).  From  Equation  20  we  have 
„(-»+<)  .  x*  —  .  \hIM-,!b(xl,»>)  -  bfx*)J,  and  so  jlxl"”-'* 

-  x*||,  2  Xbs||M-'|Wi»W  -  **!l»  •  Since  - 

[Sein^irh/Z)^  <  I2**h*(l  -  irh’/Zf)^1,  l|x<“*l>  -  x*||,  < 
Xh2[2;rV(l  -  er=h:;/24)I]-,tc0;|x<™*>  .  x*||,  -  X[2r’-(1 

-  *'hs/24)2]-V0||x<“>  -  x*||2.  Since  X  < 

-  irh:/2t)2,  XpT^I  -  jr2h2/24)2]-,tr0  <  l.  Therefore,  x'M> 


Since  has  only  one  limit  and  Equation  IS  is  a 
necessary  condition  satisfied  by  a  regular  DSS, 
eon/erges  to  the  unique  solution  of  Equaiton  IS,  which  is 
the  unique  DSS  |] 

5.  An  Example 

As  an  example,  we  estimate  i/0  and  the  range  of  X  for 
the  case  of  a  Lambertian  surface  with  the  incident  rays 
coir  ident  with  the  view  direction.  In  this  ease,  the  image- 
irradiance  equation  [7] 

R(f.7)  -«-{*•  **)/(<  +  ?  +  7»).  (26) 

where  -f  rp  <  t.  Obvious'y,  R  b  t  LipschiU  function 
and 

‘'a  <  (27) 

{sup|d/dt{(R(e,7)  -  Ey]^e.i,)/^}|2  + 
supidA»if([R<£if)  -  Ey;*R(^)/d»f)|2)'/2. 

sin.-.  \m  min)  ■  WmnM)\  -  l[JRPfl*  + 
(R  -  EyJ/fKf,)/^2;  5  Itf/Jfi2  +  |R  -  Eyl  | {.<?)£ 
f2!,  we  only  need  to  estimate  the  bounds  of  the  absolute 
values  cf  dR and^Rf^.ijJ/j^2. 

SincejR/,>^  —  -  lfl£/(4  +  +  (j2)2.  One  cheeks  that 

IdRArfl  <  3/5/8. 

Oc  the  other  hand.jfR/af2  —  18(3(*  -  ip  -  4)/((*  + 
ip  +  4)*.  One  checks  that  l^fR/at2!  <,  1/4  . 

Since  0  <  R,  By  £  1,  |R  -  Ey|  <  1.  Thu?  | *1*1 
{(R(f.T)  -  EjdR(f,)/^)|  <  27/M  +  1/4  -  43/M. 

A  similar  analysis  yields  |d/d7  {(R(£7)  -  EjjR( ln)U 
7)1  <  27/64  +  1/4  -  43/M. 

From  Equation  27, 

t'o  <  43/2/M.  (28) 

Thus  for  X  6  [0  ,  M/2xJ(l-irh2/2';)2/43],  the 

algorithm  in  Equation  20  converges  to  the  unique  DSS. 


494 


6.  Complexity  of  the  Iterative  Algorithm 

From  Subsection  3.1  and  Equation  IS,  M'1  is  known. 
Therefor';  to  implement  the  algorithm  in  Equatioo20,  one 
has  to  multiply  the  2Nx2N  dense  matrix  M';  by  a  vector, 
which  costs  C^N2)  using  the  conven'ional  matrix 
multiplication,  however,  as  discussed  in  Subsection  3.1,  we 
can  use  FFT  to  reduce  the  cc,  to  0(N(fogN)). 

Let  x*  be  the  unique  solution  of  Equation  IS  and  let 
Xte0[2  t^(  l  -  »*1i2/24  )®]'  *  ■»  8  <  1.  Tnen  by  a  similar 
argument  as  in  the  proof  of  Theorem  4.1,  we  hare  ||x("* 
-  x'll.  <  8  Ijx'1*"  !  -  x*|!j  ,  and  therefore, 

||x<“>  -  x*|jj  <  !lx<°>  •  x*||.  .  (N) 

As  an  example,  we  derive  t  ie  cumber  of  iterative  steps  to 
compute  a  solution  of  Equation  iS  will,  error  bound  0(h). 
Let  k  be  the  number  of  steps  required,  then  we  have 

fh  -  b, 

k  «  log  h  ’log  8. 

!f  8  *»  1/2,  then  k  ■—  ( log  N)/2.  Thus 

Proposition  0.1  For  X  6  jo  ,  tr1^"1  (!- *®h*/24|*),  it 
takes  ( log  N)  /2  steps  for  xl“'  to  converge  to  the  solution 
of  Equation  IS  with  error  h.  The  total  cost,  using  FFT  for 
matrix  multiplication,  is  thus  OfNf/ojNj2). 

7.  Interpolating  Spline  and  its  Optimality 

When  the  data  are  noisy,  the  spline-smoothing 
approach  is  appropriate.  However,  when  the  data  are 

relatively  precise,  the  interpolating  spline  approach  is 

preferable.  In  that  approach,  "iqc  seeks  a  spline  that 
interpolates  the  data  and  minimises  Equation  2.  This 
approach  is  also  briefly  discussed  in  [71.  This  is  an 

interpolator y  algorithm  and  is  therefore  almost  strongly 
optimal,  i.e.,  strongly  optimal  within  a  factor  of  2,  Me  [12], 
Chapter  1.  The  uniqueness  of  the  spline  and  its 

construction  need  further  investigation. 


8.  Conclusion 

We  studied  an  algorithm  for  solving  ihe  system  of 
equations  for  shape  from  snading,  using  spline-smoothing, 
and  we  discussed  its  convergence  and  complexity  We  proved 
.he  existence  and  the  uniqueness  of  the  smoothing-splice  and 
’.he  system  of  equations  involved.  However,  the  work  is  far 
from  being  completed.  There  are  a  number  of  aspects  which 
deserve  further  investigation. 

The  image  domain  we  use  is  a  square  region,  and  from 
the  practical  point  of  view,  it  is  too  restrictive.  An  efficient 
algorithm  for  general  image  domains  remains  to  be  explored. 
We  assume  that  the  surface  orientations  arc  known  on  the 
image  boundaries.  This  is  not  aiwtjs  the  esse,  because  of 
sharp  edges  cs  the  boundaries  or  noise  in  the  data,  binding 
a  robust  alf  jrithra  with  noisy  or  incomplete  information  on 
the  boundaries  is  ag  interesting  research  problem.  We 
studied  the  algorithm  under  the  assumption  that  the  penalty 
p?r*meur  X  is  within  a  bound.  For  arbitrary  X,  the 
existence  and  the  uc'qurness  of  the  solution  of  the  syst  u  of 
equations  involved  and  the  convergence  of  the  algorithm 
need  further  s*udy.  Finally,  ocr  algortbm  remains  to  b« 
tested  to  characterize  ita  performance  on  real  data. 

Acknowledgements 

The  author  is  grateful  to  J  R  lender,  G.  W. 
Wasilkcwski.  and  II  Wozniakowski  for  their  advice  during 
this  work.  The  valuable  comments  from  H.  lrickey  are 
deeply  appreciated. 


4  D 13 


C-.‘S 

iv.t 


-J 


References 


1.  Bruss,  A.  R.  "Some  Properties  ol  Discontinuities  in  the 
Image  Irradiance  Equation."  AI  Memo,  A1  lob.  MIT,  511 
(1979). 

S.  Horn,  B.K.P.  "Shape  from  Shading:  a  Method  for 
Obtaining  the  Shape  of  a  Smooth  Opaque  Object  from  One 
View."  TeehniealReport MAC-TR-79  ,  (Project  MAC,  MIT 
1970),  . 

3.  Horn,  B.K.P.  Determining  Shape  from  Shading.  In  The 
Psychology  of  Computer  'Anon,  Winrton,  P.H., 

Ed. .McGraw-Hill,  New  York,  197S,  eh.  4. 

4.  Horn,  B.K.P.  "Understanding  Image  Intensities." 
Artificia1  Intelligence  8,  I  (April  1977),  201-231. 

5.  Horn,  B.K.P.,  Woodham,  R.J.  and  Sihrer,  W.M. 
"Determing  Shape  and  neflectance  Using  Multiple  Images." 
AIMcmc  ,  AI  Lab.  MIT  490  (1979),  . 

8.  Horn,  B.K.P.,  and  Sjoberg,  R.W.  "Calculating  the 
Reflectance  Map  ’  Applied  Optics  18  (June  1979), 

1770-1779. 

7.  Ikeuchi,  K.  and  Horn,  B.  ><  P.  "Numerical  Shape  from 
Shading  and  Occluding  Boundaries."  Artificial  Intelligence 
17(1991),  141-184. 

8.  Laurent,  P.  J.  Approximation  ct  optimisation. 

Hermann,  Paris,  1972. 

9.  Nicodemus,  F.  E.,  Richmond,  J.  C,  Hsia,  J.  J., 

Ginsberg,  I.  W.,  and  Lindens,  T.  "Geometrical 
Considerate  is  and  Nomenclature  for  Reflectance."  NBS 
monograph  16C  US  Department  of  Commerce  (1977), 
National  Bureau  of  Standards. 

10.  Silver,  W.  "Determining  Shape  and  Reflectance  Using 
Multiple  Images."  Maser  Th.  Dept,  of  E.  E.  and  C3  (MIT 
1980). 

11.  Smith,  G.  D..  jifumcrica!  Solution  of  Partial 
Differential  Equations:  Finite  Difference  Methods.  Oxford 
University  Press,  1978. 

12.  Traub,  J.  F.,  and  Womiakowski,  H.  A  General 
Theory  of  Optimal  Algorithms.  Academic  Press,  1SJ0. 

13.  Woodham,  R.J.  A  Cooperative  Algorithm  for 
Determining  Surf'ce  O.'ientations  from  a  Single  View. 
Proceedings  of  the  Fifth  International  Joint  Conference  on 
Art  ificial  Intelligence,  Aug.,  1977,  pp.  833  041. 

14.  Woodham,  R.J.  Reflectance  Map  Techniques  for 
Analysing  Defects  in  Metal  Castings.  Ph  D.  Th.,  MIT 
Artificial  Intelligence  Lab.,  June  1978.  Available  as  Al- 
TR-457 

15.  Wocdhain,  R.J.  "Photometric  Method  for  Determining 
Surface  Orien'ation  from  Multiple  Images." 

OpticalEn gincering  i9(l )  (1980),  139-144. 


GC976H  5-5Z 


GENERALIZED  CONE  DESCRIPTIONS 
FROM 

SPARSE  3-D  DATA' 

Kashipeti  G.  Rao  ana  R.  Nevada 

Intelligent  Systems  Group 
Departments  of  Electricel  Engineering 
and  Computer  Science 
Powell  Hell  Room  234 
University  of  Southern  California 
Los  Angeles.  CA  90089  0273 


ABSTRACT 

This  paper  presents  an  approach  to  describing 
objects  as  Generalized  Cones  (GCs)  starting  from  sparse 
3-0  data,  such  as  that  obtained  from  stereo.  The  current 
method  is  best  suited  to  Linear  Straight  Homogeneous 
GCs  [lL  though  we  believe  it  could  be  extended  to  more 
complex  GCs.  The  method  has  bean  tested  on  a  number  of 
synthetic  images  and  results  for  some  are  presented. 

t.  INTRODUCTION 

Segmentation  of  a  scene  ir.to  objects,  or  the 
resolution  of  the  so  called  'figure-ground*  problem  is  one 
of  the  key  problems  in  computer  vision.  This  prooiem  i« 
been  solved  for  polyhedral  scenes  if  psr.'oct  fine  drawings 
are  given  as  Input.  Methods  availaoie  for  more  general 
objects  are  highly  restricted  so  far.  The  us  ral  approach  to 
cope  with  the  imperfections  in  the  low-levnl  descriptions 
is  to  assume  that  the  specific  objects  to  be  viewed,  and 
evsn  their  approximate  orientations,  are  known  a  priori, 
and  then  segmentation  is  performed  by  fitting  the  mod', Is 
to  the  low-level  descriptions.  The  various  systems  differ  in 
the  specificity  with  which  the  object  models  must  be 
known. 

In  this  paper  we  propose  an  approach  to  segmenting 
scenes  assuming  that  the  objects  to  be  viewed  are  wel1 
described  es  generalised  cones.  The  current  method  is 
best  suited  to  ‘linear,  straight,  homogeneous*  generalized 
cones  (in  Shafer's  terminology  (fj).  though  we  believe  thev 
the  method  generalizes  to  e  much  broader  class  of 
ob|ects.  We  assume  that  our  low-level  descriptions 
consist  of  spares  3-D  data,  at  might  be  genersted  by  e 
stereo  system,  *o<  example.  Thus,  information  is  available 
only  at  the  Intensity  discontinuities,  typically  at  the  object 
boundaries,  surface  discontinuities  and  surface  markings 
We  do  not  assume  that  the  3-U"  data  is  available 
everywhere  on  the  boundary,  i.e.,  we  must  reason  with 
incomplete,  and  imperfect  date. 


Thi»  r.iHrch  WM  suooonad,  «i  p»n.  by  It..  o.r.r...  Adv.nc.4 
X... arch  Xrojaerf  Ag.ncv  «»d  w.i  monitor. a  by  ih.  Air  Fore.  Wnjlit 
Aaron. ultc.l  L.oor.lor...  und.-  conlr.cl  F338l5a4-X-U0«  D.ro.  ord.r 
no  3H9 


2.  A  REVIEW  Of  GENERALIZED  CONS  METHODS 

We  will  not  attempt  a  very  detailad  or  complete 
description  of  tha  work  on  generalized  cones  here;  a 
tutorial  traatment  may  ba  found  in  [21  Generalized  cones 
were  introduced  by  Binford  as  useful  volume  descriptions 
for  3-D  ibjacta  13)  A  Generalized  Cone  (GC)  consists  of 
an  arbitrary  planar  shape,  called  a  cross-section 
swept  along  an  arbitrary  3-0  curve,  called  an  axis. 
Further,  the  size  and  also  shape  of  the  cross-section  may 
change  along  tha  axis;  the  rule  describing  the  change  la 
called  tha  cross-section  function. 

The;,,  a  normal  cylinder  consists  of  a  circular  cross- 
section.  no  a  straight  axis  with  no  change  In 

shape  or  size  of  me  cross-section,  tor  cone,  the 

size  changes  linearly  along  tha  axis.  Tha  precita 
restrictions  for  a  GC  have  been  different  fer  various 
researchers,  e  g  some  do  not  allow  the  cross-section 
shape  to  ch  .nge.  Shafer  has  developed  •  terminology  for 
describing  the  variants  of  a  genaralized  cone  [It  we  shall 
follow  this  terminology  where  appropriate. 

In  this  terminology  a  linear  cone  has  a  linear 
cross-section  function,  a  straight  cone  has  a  straight 
axis,  and  a  homogeneous  cone  ha*  a  Invariant  cross- 
section  shape.  We  shall  axplolt  some  special  properties  of 
a  Linear.  Straight,  Homogeneous  Generalized  Cone 
(LSHGC). 

While  it  has  been  generally  accepted  that  GCs  have 
many  advantages  as  a  representation  of  ,hape.  if  is 
difficult  to  derive  time  dose,  otions  from  the  scenes.  In 
previous  work.  Nevatia  and  Binford  [4]  used  the  boundaries 
derived  from  3-0  range  data;  the  method  would  apply  to 
2-D  boundaries  also,  but  m  either  case  requires  that 
uoundaries  be  complete.  Msrr  describes  another  method 
fc'  determining  axes,  also  from  complete  boundaries  [SI. 
Brooks  deals  with  imperfect  data  n  his  ACRONYM 
system  [6L  but  this  requires  rather  d  .tailed  knowledge  of 
the  obiect  being  viewed,  and  some  knowledge  of  the 
viewing  position. 


A 


497 


Deriving  GC  descriptions  from  a  monocular  image  is 
a  complex  task.  If  we  are  able  to  gather  3-L)  data,  from  an 
active  range  finder,  or  s'ereo,  some  of  the  difficulties  of 
segmentation  disappear.  In  practice,  however,  it  is  difficult 
to  ge»  perfect  range  data  everywhere.  For  stereo  analysis 
in  particular,  it  is  possiole  to  get  range  data  for  only  the 
non-homogeneous  parts  of  the  image.  Thus,  for  smooth 

*  objects,  we  may  be  able  to  get  this  information  at  object 
boundaries  only.  Depth  at  other  points  may  be  determined 
by  interpo'stion  (eg.  see  [7,  8]);  however,  good 
interpolation  requires  the  knowledge  of  surfaca 
dlscontinurias  itself,  :.e..  the  solution  of  the  object 

^  segmentation  problem.  (In  fact  we  were  motivated  to  this 
problem  in  trying  to  devise  an  interpolation  scheme  for 
our  stereo  program.)  Even  for  an  active  range  finder,  there 
may  be  many  spots  where  range  data  cannot  be  reliably 
obtained,  e.g.  due  to  color  or  tho  reflectivity  properties  of 
the  surface  at  these  points. 

X  OUR  APPROACH 

We  assume  that  we  have  sparse  3-0  data  for  a 
scene,  as  may  be  expected  from  a  line  or  edge  based 
stereo  system  (such  as  [9J);  i.e.,  we  have  3-D  data 
available  at  the  points  that  are  detected  as  edges  and  are 
found  to  have  a  corresponding  match  in  the  stereo  pair. 

The  boundaries  in  any  scene  may  be  considered  to 
be  in  rhe  following  classes  (see  figure  1  for  an  example): 

t.  Occluding  boundaries: 

This  is  a  boundary  where  the  visible  surface  Is 
all  to  one  side  of  the  boundary  (shown  by  *1* 
in  figure  1).  Previous  contou'  analysis,  of 
Nevatia  and  Binford.  and  Of  Marr,  essentially 
assumes  that  these  are  the  only  visible 
contours. 

7.  Surface  slope  discontinuities: 

These  may  be  caused  by  slope  discontinuities 
in  the  cross-section  shape,  such  as  along  the 
corners  of  the  cross-sections  of  a  polyhedral 
object  or  by  a  "terminator*  (i.e.  the  end)  of  a 
GC  (The  slope  discontinuities  are  shown  by  *2* 
in  figure  1).  These  kinds  of  boundaries  would 
cause  difficulties  in  the  earlier  methods  of 
analysis,  but  we  will  show  how  they  mey 

•  provide  very  valuable  pieces  of  information. 

3.  Surface  Markings: 

These  are  caused  by  changes  in  the  surface 
reflectance  rathor  than  the  surface  position  or 
slope  ("3*  in  figure  1).  In  simploiic  analysis, 

4  these  may  be  confused  with  occluding 

boundaries. 

4,  Others: 

Other  sources  are  due  to  noise,  shadows, 
highlignts  etc.  Our  approach  does  not  deal 
with  them  explicitly,  but  should  work  in  their 


presence  (we  can  essentially  consider  them  to 
be  same  as  surface  markings  ft.  our  analysis). 

1 .  Occluding  boundaries 
c.  Surface  orientation  discontinuities 
3.  Surface  markings 


1 

Figure  1:  Two  Occluding  Objects  And 

A  Classification  Of  The  Boundaries 


For  GCs,  the  important  boundaries  could  be 
alternatively  classified  as  those  produced  from  ‘contour 
generators  (cgs)*  and  ‘terminators'.  Intuitively,  contour 
generators  are  the  extremal  points  on  the  surface  which 
enclose  the  visible  surface  (and  are  thus  view-point 
depandent).  For  a  smoo*'  *!C,  the  contour  generators  are 
the  points  un  the  si.  where  the  line  of  sight  is 
tangential  to  the  viewed  surface.  More  generally,  the 
contour  generators  are  the  loci  of  the  extremal  points  on 
the  cross-sections.  (In  this,  our  definition  is  slightly 
different  rrom  that  used  in  Shafer,  but  has  the  same 
intuifive  notion.  Also  we  will  not  distinguish  between 
*con'  »'  which  are  projections  in  the  image  of  contour 
generators  and  contour  generators  tnemselves,  unless 
necessary  to  do  so,  as  we  use  3-D  boundaries  )  Contour 
generators  for  a  simple  cylinder  are  shown  in  figure  2. 
Vote  ■*— *  in  our  terminology,  contour  generators  are  a 
part  of  fha  occluding  boundaries.  In  an  iSHGC  we  usually 
have  2  segments  for  the  contour  generator  separated  by 
the  terminators.  These  shall  be  referred  to  as  cgt  and 
cg2 

The  terminators  of  a  GC  are  simply  its  ends  (imagine 
an  infinite  GC  cut  at  a  point).  Note  that  the  cut,  and 
hence  a  terminator  need  not  be  planar,  and  when  planar.  >t 


49G 


i.'-'.'ija 


■  r  1  '.^h 


© 


need  not  be  normal  to  the  axis.  (Thus,  we  prefer  to 
describe  a  right,  circular  cone  with  a  slanted  cut.  as  a 
straight,  homogeneous  GC  with  an  oblique  termination, 
rather  than  as  a  homogeneous,  GC  with  cross-section 
changing  scape  at  the  end,  i.e.  our  descriptions  are 
necessarily  in  terms  of  rignt  GCs.)  Terminator  bounda-es 
for  a  simple  cylinder  are  shown  in  figure  2;  we  may  note 
that  the  terminator  may  share  part  of  the  occluding 
boundary.  The  frrminators  have  been  a  source  of 
difficulty  in  analysis  of  boundaries,  as  in  [4],  however,  they 
can  provide  valuable  clues  to  the  shapes  of  the  cross- 
sections. 

CG:  Contour  Generator 

T:  Terminator  Boundary 


Figure  2:  Another  Classification  For  CC  Bounda'ies 

The  scene  segmentation  probiam  may,  now,  be 
considered  to  be  that  of  isolating  the  contour  generators 
and  terminators  of  the  GCs  |  eser.t  in  a  scene.  Also, 
having  these  boundaries  goes  a  long  way  towards 
describing  the  GCs.  the  a;  is  comes  from  the  axis  o’ 
symmetry  of  the  contour  generators  and  the  cross-section 
shape  comes  from  the  te'rninators  under  certain 
conditions. 

The  key  to  our  approach  consists  of  the  following 
observati-  ns  about  boundaries  of  GCs: 

a)  The  contour  generator  is  tangential  (in  3-D)  to  the 
terminator  boundaries  and  3-0  tangency  also  implies  a  l- 
P  tangency.  Further  the  contour  generator  (and  thus  the 
entire  object)  must  be  to  one  side  of  the  plane  containing 
the  local  cortour  tangent  and  the  viewing  point.  (In  2-D. 
the  terminator  boundary  must  be  all  on  one  side  of  the 
contour  at  the  junction.) 

b)  For  a  linear,  straight,  homogeneous  GC  (LSHGC), 
the  contour  generator  is  planar  from  any  view  (established 
by  Shafar  1 1  ]) 


For  a  non-linear  SHGC.  the  contour  generators  are 
planar  in  a  side  view  but  not  necessarily  in  an  oblique 
view.  Properties  of  contour  generators  of  unconstrained 
GCs  arb  not  known,  but  we  expect  that  they  can  be 
approximated  by  piecewise  LSHGCs.  giving  rise  to 
'piecewise  coplanar*  contour  generators.  In  our  current 
implementation,  we  have  tested  only  the  LSHGCs,  but 
believe  that  the  methods  will  extend  to  elongated  GCs  by 
using  the  piecewise  approximation. 

Our  approach,  then,  is  to  find  terminators  and 
contour  generators  by  tes'ing  for  the  abov  properties.  In 
the  current  implementation  we  search  lor  all  alternatives 
and  then  choose  the  preferred  descriptions  based  on 
certain  preference  criteria.  In  general,  the  search  itself  vill 
need  to  be  constrained  for  more  complex  scenes  The 
details  of  the  method  and  a  discussion  follow. 

4.  D  FT  AILS  OF  THE  METHOD 

The  input  to  this  system  is  a  set  of  3-D  line 
segments,  the  output  is  a  set  of  descriptions  ot  objects 
using  rho  GC  representation. 

The  block  diagram  cf  the  m-ffhod  is  given  in  figure 
3.  details  are  in  the  following  sub-sections.  As  an 
example,  we  shall  consider  figure  4.  which  is  a  projection 
of  hypothetical  3-0  sparse  data  for  a  scene  with  a 
cylindet  We  assumj  the  3  -D  segments  1  and  6  lie  in  the 
plane  of  the  figtirs. 

4.1.  Preliminary  Processing 

*  Find  Lines:  Line  segment  information  comes  from 
a  lower  level  program  and  consists  of  the  positions  of 
end-p .  nts  of  each  line  in  2-D  and  in  3-0.  Other 
attrib'  tcs.  such  as  length,  are  computed. 

■  Line  Relations:  Relationships  between  every 

pair  of  -:ne  segments  like  coplanarity  ,  parallelism  and 
convergence  (meeting  at  a  junction)  are  found. 

•  Find  Junctions:  A  junction  is  termed  when  several 
segments  mee*  at  a  point  or  when  the  distance  between 
their  extremities  is  within  a  preset  threshold 

4.2.  Searching  For  Possible  CCs 

We  noxt  search  to'  groups  of  line  segments  that  can 
be  looked  upon  as  GCs.  Our  approach  consists  of 
searching  for  some  evidence  of  existence  of  a  GC  and 
then  verifying  it  with  other  evidence.  Our  search  can 
proceed  >n  ona  of  the  following  two  modes: 

1.  Find  contour  generators  first  and  verify  by 
finding  the  corresponding  terminator(s)  (called 
the  Contour  Generator  Directed  Method). 

2.  Find  the  terminator(s)  first  and  verify  by  finding 
tiia  corresponding  contour  generator^)  (called 
the  Terminator  Directed  Method) 


sparta  3*d  informotion 


Prelim*  wry 
Processing 


Contour  Generator  (eg) 
Directed  Method 

(find  cg's  first,  then 
terminators) 


terminator  Directed 
Method 

(find  terminators 
f**st,  then  cg's) 


Searching 

for 

GCs 


Choosing  GCs 


t  GC=  Generalized  Cone 

[Describing  Ob  (acts  c0:  contour  generator 

os  GCs 


GC  descriptions 


Figure  3:  Block  Diagram  Of  The  Method 


]P^\n  S  \24^ 


Figure  4:  Illustrating  The  Method  With  A  Cylinder 


[il*] 


These  two  method!  are  described  in  detail  below 
At  this  time,  we  use  both  methods  to  give  a  list  of 
possible  GCs  and  choose  among  them  by  soma  preference 
criteria  (given  later).  A  complete  search  can  be  expensive 
for  a  complex  scene  end  will  need  to  be  guided  by  results 
of  a  partial  analysis.  We  have  not  invest  gated  the  issues 
of  controlling  such  s  search  yet;  our  approach  will  be  that 
the  ‘strongest*  evidence  ^  to  be  used  first  and  that 
generally,  sufficient  strong  evidence  can  be  found  that 
exhaustive  search  is  ■  nnoc  -sary 

411.  The  Co-. lour  Gene.vtur  Directed  Method 

Each  p;.<  of  cu*.  anar  line  segments  is  a  possible 
pair  of  coh-<  jr  generators.  To  prune  the  s-  rch  space,  we 
consider  oriy  those  segments  whose  length  is  greater 
tha.t  ,*w  average- --jgment-length  (average  length  of 
segments  in  the  scene).  cor  each  such  pair,  we  look  for 
the  corresponding  terminators),  by  the  following  method: 

•  Find  those  segments  which,  when  protected  to  the 
contour  generator  plans  He  in  between  the  contour 
generators  Only  such  segments  could  belong  to  the 
termmatrrc  as  the  cgs  j  c.  by  definition,  the  extremities 
of  the  obiect  from  a  certain  point  of  view.  In  figure  4.  if 
we  consider  (t.  6)  as  possible  cgs  then  ell  but  segments 
18  i9.  20  end  21  would  be  in  between  the  ege  and  could 
belong  ;o  the  terminators  ot  (1.  6). 

*  Find  begin  and  end  segments  of  terminators:  For 
each  end  point  of  each  contour  generator  and  for  aach 
sida  of  tha  contour  generator  plane,  pick  the  segment  one 
of  whose  extremities  s  closest  to  the  eg  end  point  This 
is  Justified  because  on  each  tide  of  the  eg  plane  there  is  • 
unique  begin/end  segment  associated  wrth  each  end  point 
of  a  contour  generator  Mole  that  we  use  this  method 
rather  then  connectivity  because,  in  general,  tha  (unction 
between  the  eg  and  tha  .armmator  may  be  missing. 
Segments  associated  with  one  of  the  contour  generators 
are  called  begin  segments  and  thosa  associated  with  tha 
ether  eg  are  called  end  segments.  In  figure  4.  segment  1 1 
is  chosen  as  a  bagin  segment  rather  that:  ssgment  25 
because  1 1  is  closer  to  the  extremity  of  the  contour 
generator  segment  1.  The  other  begin  segments  found 
are  10  and  2;  the  end  segments  found  are  7.  14  a  nr,  5 

*  Stating  from  each  bagin  sagmant  traca  tha 
possibia  tarmmatnr  till  an  and  sagmant  is  reached,  using 
the  algorithm  tor  tracing  a  contour  explained  later,  in 
paragraph  4  23  For  exampla.  m  figure  4  we  get  (2.  3.  4.  5) 
as  a  possibia  tarminator. 

4  2.2.  Tha  Tarminator  Directed  Method 

We  find  ordered  sets  of  segments  that  are  potential 
terminators  by  the  following  method 

•  Fmd  maximal  coplanar  sets.  ■  a,  tha  largest  sat  of 
segments  that  are  all  in  tha  same  plane  (wa  consider  sets 
with  3  or  more  segments)  For  noisy  data,  wa  find  tha  sat 
of  segments  lying  inside  a  thin  slab  rather  than  those 


lying  strictly  in  a  plana.  In  figure  4  segments  (7,  8.  9.  10. 
11.  12.  13.  14.  22.  23.  24)  would  form  one  maximal 
coplanar  sat. 

*  Find  maximal  convex  sets:  For  each  coplanar  sat 
found  above,  we  remove  those  segments  that  do  not  Jie 
on  tha  largest  convex  hull.  A  segment  is  removed  rf  there 
are  segments  lying  on  both  of  its  sides.  For  example.  22, 
23  and  24  a  re  removed  from  the  maximal  coplanar  sat 
found  above,  to  obtain  (7.  8.  9.  10.  11.  12,  13.  14).  At  this 
point  tha  segments  in  tha  sat  are  not  ordered. 

*  Find  the  extremal  segments:  the  extremel 

segments  belong  to  the  two  sections  of  tt*e  contour  with 
minimum  angle  m  the  eg  plane  (this  is  the  Uneer- 
segment-version  of  obtaining  the  sections  of 
maximum/minimum  curvature  of  a  curve).  Note  thjit  as  in 
an  inclined  cylinder,  we  cannot  simply  consider  tne 
leftmost  and  the  rightmost  segments  as  the  extreme! 
segments  of  tha  tarminator  Wa  now  obtain  a  pair  of 
extremal  segments  per  section  and.  consequently,  four  for 
eecn  termmetor  One  pair  it  called  the  begin  segments 
end  the  other  oeir  is  celled  the  end  segments,  tor 
convenience  in  tracing  the  terminator  (described  below). 
Segments  11.  10  are  tha  begin  segments  and  7.  14  are  the 
end  segments  of  the  maximal  convex  set  obtained  above. 

*  For  each  set  of  segments  found  above,  we  look 
tor  the  corresponding  contour  generir>rs.  i.a,  coplanar 
segments  that  do  not  lie  In  the  terminator  plane  and  are 
doeest  to  the  extremel  segments  of  the  terminator. 
Segments  1  end  6  satisfy  this  for  the  above  set.  We  then 
trace  the  terminator  by  starting  from  each  begin  segment 
end  taking  only  those  segments  that  are  in  the  maximal 
convex  set  until  an  end  segment  la  reached.  For  this  we 
use  the  tracing  procedure  explained  next  in  the  example, 
the  terminator  we  obtain  is  (11.  12.  13.  14.  7.  8.  9.  10) 
ordered  from  eg*  1  to  cg»6 

433  Tracing  A  Centaur 

Both  of  the  above  methods  need  a  technique  io 
trace  the  tarminator.  Tha  tracing  problem  consists  of 
starring  from  a  begin  segment  and  linking  it  to  ita 
naighbors  in  a  chain  until  an  end  segment  it  reached,  such 
that  soma  constraints  are  satisfied.2 

This  is  in  general  a  difficult  problem  because  or  tha 
presence  of  surface  markings  and  other  noise  segments. 
Wa  may  also  have  gaps  betwn--  vagmants  and  trance 
cannot  always  use  connectivity.  .iso  tne  shape  of  the 
contour  it  not  mown  a  priori,  hance  standard 
transformation  tochmques  like  the  Hough  transform  art 
not  applicable 


wr**s  Iwo  chains  nouo  Ihoir  bofn  am*  aaV  s-monts  in— ling  thay 
srs  csshKiS  IO  lomt  t  s.ngla  contour  lit  mo  currant  rsmus.  chains 
it*,  it.  11  i4i  ana  MO  9  R.  7|  ora  combutoS  to  form  l ho  eoniotn  III. 

ii  ij  i*  *  a.  9  i ci 


a 


a 


a 


In  our  method,  to  trace  the  contour,  we  first  use 
connectivity  in  the  direction  of  traversal,  i.e,  we  look  for 
the  nest  segment  as  the  one  connected  to  the  current 
segment  but  away  from  the  previous  one.  For  example.  In 
figure  4  if  the  current  segment  is  11  and  the  previous 
segment  is  1  (the  eg).  the  r~jit  segment  is.  either  12  or  IS. 

Choosing  From  Among  Severe!  Segments:  At  e 
junction,  we  need  to  choose  emong  meny  ettemettve 
branches.  For  exemple.  in  figure  4.  starting  from  segment 
M.  we  could  choose  either  1 2  or  IS  on  the  basis  of 
connectivity  alone.  This  problem  arises  especially  when 
we  have  surface  markings  meeting  the  terminator  as 
illustrated  in  the  figure.  We  would  like  to  ignore  the 
surface  mark  (segment  IS)  end  instead  continue  tracing 
the  terminator  (taka  segment  12). 

We  choose  the  segment  moat  contiguous  with  the 
current  segment  i.e,  the  segment  moat  coMnear  with  tha 
currant  ona.  In  the  example.  12  Is  more  coMnear  wtth  11 
than  is  15  with  11.  Wa.  therefore,  choose  12. 

Crossing  Geos:  Sometimes,  however,  we  may  not 
have  any  segment  connected  to  the  current  segment  In 
the  direction  of  traversal,  such  as  at  the  end  of  segment 
12  m  figure  4.  To  choose  the  most  appropriate  nan 
segment  we  search  in  a  neighborhood  of  the  current 
segment  for  segments  satisfying  the  following  constraints 
end  preferences: 

beDs  constraints 

*  The  next  segment  should  be  in  the 
general  direction  of  traversal  so  far.  (It 
should  be  closer  to  the  current 
segment  then  to  thy  previous  one  and 
not  have  a  junction  with  the  previous 
segment)  For  example,  in  our  current 
ptth  (11  end  12,  *  eg  merit  IS  does  not 
qualify  but  segment  23.  16  end  13. 
among  other  segments  do. 

*  The  next  segment  should  not  be  a  part 
of  tha  terminator  traced  so  far. 
Segment  11  does  not  qualify  ea  it  ia 
already  part  of  the  terminator. 

fdditional  constraints 

(applicable  only  if  we  have  already  found 
contour  generators) 

*  The  next  segment  should  lie  in 
between  the  2  contour  generators 
being  considered.  For  example, 
segments  20.  s'.  18  and  19  would  not 
qualify. 


*  The  next  segment  should  lie  on  tiie 
seme  end  of  the  contour  generators  is 
the  current  end  previous  segments. 

Two  segments  lie  on  the  same  end  of 
the  cgs  if  they  lie  on  the  seme  side  (in 
the  eg  plane)  of  the  straight  line 
through  the  midpoints  of  the  cgs.  In 
this  exemple.  the  next  segment  end 
segments  11.  12  should  lie  on  the 
same  end  of  the  cgs  1  and  6.  Thus 
segments  4.  17.  5  etc.  do  not  qualify 
as  the  next  segment  but  13.  14,  6,  7. 

24  end  26  do. 

*  The  next  segment  should  also  He  on 
the  seme  side  of  the  terminator  is  the 
current  legmen.  I.e,  the  seme  side  of 
the  contour  gereiretoc  plene  as  the 
current  segment  In  this  example, 
segments  (2.  3.  4.  5.  11,  12.  13,  14.  15. 

16.  17.  25.  26)  lie  on  one  side  of 
contour  generator  plene  whereas  (7,  8. 
g.  10)  ed  lie  on  the  other  side  of  the 
olarte.  When  the  current  segment  is 
12  and  the  previous  one  s  1 1.  the  next 
segment  should  lie  on  the  leme  side 
of  the  eg  pisne  ss  11  end  12.  i.e,  it 

should  belong  to  the  act  (2.  3 . 

26)  given  above 

preferences 

Having  eUmineted  some  segments  using  the 
above  constraints,  we  may  still  be  left  with  s 
number  of  candidates  for  ttv  next  segment  m  this 
example  we  wM  be  left  with  segments  13.  14  and 
26.  To  choose  the  next  segment  we  use  the 
foCowing  preference  criteria  (the  weights  used  are 
giver  in  parenthesis)' 

*  Segment*  coptanar  with  the  current 
and  previous  segments  are  preferred 
(werght-50000)  m  this  esse  13  and  14 
will  be  preferred  to  26.  as  they  are 
coptanar  with  11  and  12  but  26  Is  not 

*  Segments  lying  on  the  circle  In  which 
previous  segments  lie  are  given  e 
greeter  weight  as  we  prefer  circular 
terminators  (weight-25000).  Again.  13 
and  14  lie  on  a  circle  through  1 1  and 
12  but  26  doe*  not 

*  A  segment  closer  to  the  current 
segment  Is  preferred  to  segments 
farther  sway  (weight  ranges  from  0  to 
1000)  Here  13  is  preferred  the  most 
end  then  26  and  14. 


*  A  segment  mora  colli  near  with  ttia 
currant  sagmant  is  prtrterrad  (weight 
ranges  from  0  to  100)  Again.  13  is 
mora  collinaar  with  12  than  is  26  or  14 
with  12  and  is  thus  preferred  ovar 
tham. 

Nota  that  tha  first  two  criteria  ara  assignad  much 
mora  waight  than  tha  othar  two.  Howavar.  tha  first  two 
critaria  ara  not  mandatory  as  thay  may  not  hold  and  than 
w«  hava  to  dapand  on  tha  ottu.r  two.  Tha  axtant  to  which 
a  sagmant  is  prafarrad  is  datarminad  by  adding  tha 
waights  and  comparing  tha  n«  waight  with  tha  nat  waight 
for  competing  sagmant*.  Tha  sagmant  with  tha  highast 
waight  is  salactad  aa  tha  next  sagmant  In  this  example, 
both  sogmants  13  and  14  satisfy  tha  first  two  criteria  but 
13  gats  a  graatar  waight  from  tha  last  two  critaria  and  is 
thus  salactad  as  tha  naxt  sagmant 

4.3.  Choosing  GCa 

Ws  now  hava  savarai  posstbla  GC  dascriptions  of 
tha  sama  volume  and  wa  naad  a  mathod  of  rating  tham 
and  choosing  tha  battar  dascriptions  For  this  wa 
characterize  tha  GC*  by  finding  tha  attributes  of  tha 
ccntour  generators)  and  tha  terminatorfs).  Tha 
charactaristics  of  cgs  usad  ara:  thair  langth  and  whathar 
thay  ara  paraliai  or  not.  Tha  charactartstiCT  of  tarminators 
usad  ara:  closad  or  opan.  planarity  and  circularity.  If  a 
planar  tarminator  Is  found  wa  compute  its  normals  If  tha 
tarminstor  is  circular,  wa  computa  tha  cantar  and  radius  of 
tha  circls.  alsa  wa  computa  i.s  cantrold  and  tha  avaraga 
distanca  of  tha  cantroid  from  tha  boundary  sagmants. 

Preference  maasuras  ara  than  computed  for  aaefi  GC 
baaad  on  its  attributas.  Associatad  with  aach  attribute  la  a 
waight  indicating  its  ralatlva  importanca.  Nota  that  thaaa 
waights  ara  ordar  of  magnituda  numbars  Wa  illuatrata 
this  mathod  by  considarlng  tha  GC  found  in  tha  axampla 
of  fige*  *  with  cgl*1  and  cg2*6  and  tarminatora  tl -<  1 1 . 
12.  13.  14.  7.  8.  9.  10)  and  t2-<2.  3.  4.  6).  Remarks  for  this 
axampla  ara  mada  paranthatlcally. 

1.  Two  parailal  tarminators  -highly  preferred-; 
waighfSOOO  (tl  and  t2  satisfy  this). 

2.  Planar,  closad  tarminator  waight*  1000  (tl 
aatiafias  this) 

3.  Planar  tarminator.  not  closad;  waight* 600  (t2 
satisfiaa  this). 

4.  Closad  tarminator.  not  planar;  waight*  100  (thia 
cat#  doesn't  hold  her#) 

5.  Circular  tarminator.  in  addition  to  baing  planar 
waight*100  (both  tl  and  t2  satisfy  this). 

S  Tarminators  with  mora  sagmants  ara  p'efarrad 
(thay  giva  mora  information  about  tha  cross- 
saction);  waight*  10  par  sagmant  In  tarminator 


(tl  gets  a  wnight  of  10*8*80  and  t2  gats  a 
waight  of  10*4*40). 

7.  Longar  contoui  ganarators  are  assigned  a 
highar  waight  than  shorter  onas  (wa  prefer  a 
GC  description  with  a  longar  axis  than  one 
with  a  shorter  ona);  weight*(length-of- 
cg)*1 00/(maximum-length-of-  seqments-in- 
the-scene)  (assuming  egl  and  cg2  to  be  tha 
longest  segments  in  tha  scans,  we  gat  a 
weight  of  100  for  each  eg). 

8.  Parallel  contour  generators  are  prafarrad  (wa 
prater  cylinders  ovar  cones);  weight*10  (egl 
and  cg2  satisfy  this). 

9.  Tcrminatoni  normal  to  tha  contour  generators 
are  prafarrad;  weight*  100  (egl  and  cg2  are 
normal  to  tl  and  t2). 

Tha  prsfaranca  measure  of  aach  GC  is  found  as  the 
sum  of  the  weights  of  Its  attributes.  (In  tha  GC  of  thia 
axampla.  tha  preference  measure  is  7020.)  Tha  GCs  ara 
than  toned  according  to  thair  preference  maasuras. 

GCs  that  have  tha  same  aagmar.ta  ara  assumed  to 
describe  the  tame  volume.  Therefore,  a  GC  that  is  disjoint 
from  any  GC  of  highar  preference  Is  treated  as  a  aaparata 
object 


44  Describing  Tha  Objects  Aa  OCs 

Wa  would  now  Ilka  to  osacri be  the  objects  found  In 
the  scene  in  terms  of  GCs.  This  means  that  for  each 
object  wa  find  tha  three  GC  functions,  which  ara:  tha  axia 
curve,  tha  croaa-tection  and  tha  cross-section  function. 
Tha  axis  curve  is  tha  3-0  iocue  of  tha  cantroid  of  tha 
croes-aectfons  of  tha  GC.  Tha  crost-uectlon  function 
Indicates  tha  way  tha  cross-section  size  changes  as  wa 
go  along  tha  axle  (Shafer  s  transformation  rule).  Tha  cross- 
section  indicates  tha  crose-eeclion  shape.  (Sines  wa 
modal  oblecte  it  Right  GCs.  tha  crose-eectfon  olana  la 
assumed  to  be  normal  to  the  axis.) 

These  functions  most  be  deduced  from  tha  contour 
ganarators  and  tarminators.  Wa  find  tha  axis  curve  at  tha 
line  of  symmetry  between  tha  the  contour  ganarators. 
Tha  cross-section  function  It  defined  by  tha  distance  of 
either  contou*  generator  from  tha  axis.  Tha  cross-saction 
ahspe  it  derided  on  tha  basis  of  tha  shape  of  tha 
prafarrad  tarminator,  if  ona  or  more  has  been  found.  A 
circular  tarminator  is  prafarrad  tha  most,  naxt  a  closad 
tarminator  and  than  ona  with  a  larger  number  of 
segments. 

In  tha  case  of  a  circular  tarminator,  only  a  part  of 
which  is  seen  (duo  to  occlusion,  say),  wa  taka  tha  cross- 
section  to  bo  tha  complete  circle.  In  tha  case  whan  only 
part  of  one  contour  generator  it  seen,  wa  hypothesize  that 
tha  object  extends  as  far  aa  tha  longar  contour  generator 


503 


V'  -  • , 


,v; 

Va 


•.  *,  */  -/  V  •/  A*  -A-  •/  V  O  •/ •/  s'  •-*  ^  •:  -a*  V  ■/ %“  •/  W  % 


I 


and  than  compute  the  axis  curve  and  the  cross-section 
function. 

5.  AN  EXAMPLE  TO  ILLUSTRATE  THE  METHOD 

We  now  illustrate  the  working  of  our  algorithm  with 
the  example  of  figure  4  In  this  case  the  eg  directed 
method  explores  (  £  )-(  2  )-32S  possibilities  (here  n  is  the 
number  of  segments  in  the  scene).  Of  these  oniy  32  pairs 
have  coplanar  cgs  and  7  of  these  have  corresponding 
terminators.  The  method  decides  there  is  one  object  the 
GC  with  cgs  (1.6)  end  terminators  (11,  12.  13,  14.  7.  8.  9. 
10)  and  (2.  3.  4,  5).  as  it  is  the  best  GC  and  all  other  GCs 
are  its  subsets.  This  GC  is  given  a  very  hiph  preference 
measure  because  it  has  planar,  parallel  terminators,  one  of 
which  is  closed,  end  it  has  long,  parallel  cgs.  Other  GCs 
ere  found  but  discarded  because  their  preference 
measures  are  lower  and  they  era  subsets  of  the  best  GC. 
For  example,  a  GC  with  (2.  7)  r.s  cgs  and  (1,  10.  3,  8,  7)  & 
(3.  4,  S.  6)  as  terminators,  is  also  found  but  has  a  lower 
preference  measure  because  of  its  short  axis  and  non- 
planar  terminators.  It  is  a  subset  of  the  best  GC  and  is 
thus  discarded. 

The  terminator  directed  method  first  looks  for 
coplanar  sets  and  finds  37  of  them.  Of  these,  it  finds  cgs 
for  |ust  two  of  them.  For  the  sets  (7,  8.  9.  10.  11.  1?  13. 
14)  {planar  end  closed}  and  (2.  3.  4,  5)  {planar  only},  the 
cgs  found  are  1  and  6.  The  corresponding  GC  is  rated 
high.  Other  terminators  like  (1.  2.  11)  are  also  found  but 
the  corresponding  GCs  are  not  rated  high  because  the 
terminators  are  not  doted  and  corresponding  cgs  cannot 
he  found  (the  hypothesis  cannot  be  verified).  Here  again, 
other  hypotheses  are  rejected  because  they  are  poorer 
and  are  subsets  of  the  best  hypothesis. 

For  this  example,  the  two  methods  (eg  directed  and 
terminator  directed)  find  the  same  hypothesis.  The  eg  and 
terminator  sets  of  this  hypothesis  are  used  to  compute 
the  GC  functions,  which  are  then  used  to  generate  slices 
for  display. 

6.  RESULTS 

The  above  algorithm  has  been  Implemented  at  a 
computer  program  and  has  been  tested  on  a  number  of 
synthetic  images.  We  present  a  ftw  o*  the  results  here. 

Figure  5  (a)  shows  the  sparse  data  for  a  cone  with 
the  segments  numbered.  It  has  a  number  of  surface 
markings.  The  boundary  is  also  sparse  and  there  ire 
segments  missing  at  the  terminator.  The  corners 
(junctions  betwenn  terminator  and  cgs)  are  also  missing. 
This  is  similar  to  the  output  of  a  stereo  program.  The  eg 
directed  method  explores  46  possibilities.  24  of  which 
have  coplanar  cgs  and  3  of  which  have  terminators  The 
termir.stor  directed  method  finds  just  one  coplanar  set 
(with  3  or  more  segments)  and  the  corresponding  cgs  for 
it.  The  best  GC  found  has  (1.  10)  as  the  eg  pair  and  (3,  4. 

S.  6)  as  the  corresponding  terr.'inator.  The  GC  generated 


from  the  functions  corresponding  to  this  figure  is  shown 
figure  5  (b).  We  may  note  that  although  we  see  lust  a 
pan  of  the  cross-section,  cur  program  hypothesizes  a 
complete  circular  cross-sectioii  and  outputs  a  GC 
accordingly. 

Figure  6  (a)  chows  the  sparse  dato  for  a  more 
complex  scene,  with  several  objects  occluding  one 
another,  and  with  surface  mtrfcings  and  missing  segments. 
There  are  £4  segments  here.  The  eg  directed  method 
explores  2016  possibilities  of  which  about  445  have 
coplanar  ega  and  35  of  them  have  corresponding 
terminators.  The  terminator  directed  method  finds  40 
coplanar  seta  of  which  only  three  have  corresponding  cga. 
The  GCa  found  are  shown  in  ilgure  6  (b). 


O’ 


Figure  5:  A  Conn 

(a)  Sparse  3-D  Oats,  (b)  Corresponding  GC 


<b) 

Figure  4:  Several  Object*  Occluding  One  Another 
(a)  Spar**  3-0  Data,  (b)  Corresponding  GC* 

7.  CONCLUSIONS  AND  FUTURt  WORK 

We  have  presented  «n  approach  for  segmentation 
end  description  of  3-0  objects  as  generalized  cones  from 
sparse  3-0  date  in  the  presence  of  surface  markings, 
noise  and  missing  segments  Our  algorithm  has  been 
tested  so  far  with  images  of  Linear  Straight  Homogeneous 
Generalized  Cones  with  occlusion. 


Our  work  represents  only  an  initial  effort  in  this 
direction  and  many  important  problem*  remain  to  be 
solved,  some  aie  given  below: 

•  Our  program  does  not  handle  non-lmear  non- 
s-raight  homogeneous  GCs  However,  we  believe  we  can 
extend  the  methods  cf  LSHGCs  by  considering  the  non¬ 
linear  non-straight  homogeneous  GCs  as  piecewise 
LSHGCs. 


*  Use  of  surface  markings  to  strengthen  the 
hypothesis  of  a  GC. 

*  We  have  not  explored  control  issues  vet.  How  do 
we  weigh  the  hypotheses  of  different  methods,  which  to 
expfor  irst  and  which  to  prune? 


*  The  technique  of  finding  a  continuou*  curve 
amidst  noise  needs  to  be  made  more  powe<fui. 


*  The  data  for  hortzontel/reer  .onzontei  segments 
is  either  missing  or  extremely  sparse  in  the  output  of 
stereo  programs.  At  these  places  we  need  to  use  ttvs  2-0 
data,  in  addition  ,o  the  available  3-D  date. 


1.  Shafer.  &A,  "Shadow  Geometry  erd  Occluding 
Contours  of  Generalized  Cylinders.'  Tech,  report 
CMC  Report  CS  83-131,  May  1883. 

2.  Nevada.  R,  Medina  Perception,  Prentice  He#, 
1982. 

3.  Binford.  V.O,  "Visual  Perception  by  Computer," 
ZEES  Conference  on  Systems  and 
Controls,  December  1971.. 

4.  Hevetia.  R.  end  Binford.  T.O,  "Description  and 
Recognition  of  Complex -Curved  Objects." 
Artificial  Intelligence,  Vot  8.  1977.  pp. 
77-98. 

S  Mot.  0,  "Analysis  of  Occluding  Contour." 

Proceedings  of  the  Royal  Society  of 
London,  1977.  pp.  441-475. 

tt.  Brooks.  R-A,  "Symbolic  Reasoning  among  3-0 

Models  end  2-0  Images."  Tech,  report  Memo 
AiM-343.  Stanford  Artificial  intslligence  Laboratory. 
June  1*81. 

>.  Grim  sou.  W  and  Merr.  0,  "A  Computer 

Implementation  of  e  Theory  of  Human  Stereo 
Vision."  Proceedings  of  DARPA  Tange 
Understanding  Workshop,  Psio  Alto.  Ceiif, 
April  1979.  pp.  41-47. 

8.  Tcrzopoulos.  D.  Mn  It  1  resolution 

Computation  of  Visible-Surface 
Representations,  PhD  dissertation, 

Maasachuaatti  Inr'itut*  of  Technology.  Departments 
of  Computer  Scknce  and  Flectriral  Engmear.ng, 
January  1984. 

9.  Medion!.  ‘1.  and  Navatla.  R ,  Segment-based  Stereo 
Matching,"  Proceedings  of  DARPA  Image 
Understanding  Workshop,  Washington  DC, 
June  1983. 


505 


m 


t . 


p 

1 


* 


V 


c 


* 


equivalent  descriptions  of  generalized  cylinders 


£ 

6. 


Keinctt  S  Roberto* 


*  Department  of  Compater  Seieace,  Columbia  Usiversitp,  New  York,  N.Y. 


i 

y* 

V 

s 


•  *  » 
/ 

i 

v 

%■ 

s 

i 

* 

i 

I 

i 

4r 

I 

ft 

I 


Abstract 

The  'equivalence*  problem  for  shape  descriptions  it  that  a  single 
three-dimensional  ahape  map  hare  err  era!  difTsreat  descriptions. 
The  Slant  Theorem  (Shafer1)  for  equivalent  gvseraiiied  epbsder 
description  »aa  proeei  aider  the  resthctioBa  that  the  name  radin 
function  aid  'he  same  axis  be  aaed  for  all  taa  description  A  proof 
is  given  that  the  theorem  still  holds  when  the  •same  radin 
function*  •.  audition  is  removed.  It  does  pot  hold  when  the  *same 
axis*  condition  is  removed.  The  ellipsoid  is  a  counter-example. 

Introduction 

The  equivalence  problem  for  abape  description  is  that  a  single 
three-dimensional  ahape  map  have  several  difTereat,  eqaivaient 
description.  Oae  wsp  to  deal  with  this  pronlem  is  to  an  a  method 
of  generating  descriptions  which  gnaraatem  that  Ue  description 
produced  in  a! naps  a  unique,  canonical  representation.  The  other 
approach  is  to  permit  alternate  descriptions,  bat  he  able  to  tell 
when  two  description  art  equivalent,  i.e.  describe  the  same  shape 

Shafer:  investigated  this  second  approach  for  a  clan  of  generalised 
cjlinders.  After  eliminating  the  trivial  eqnivalences  due  to  rotation, 
etc.,  Shafer  gave  theorems  about  some  families  of  equivalent 
description. 

The  Slant  Theorem 

Following  Shafer1,  a  generalised  cylinder  is  Straight  if  its  axis  (or 
spine)  is  a  straight  line  segment.  It  is  Homogeneous  if  all  its  c roan- 
sections  have  the  same  shape  except  for  scale.  A  Straight 
Homogeneout  Generalised  Cplinder  (SHCC)  is  given  bp  the  four¬ 
tuple  (A,C,r,a)  (tee  Figure  1). 


Figure  It  Straight  Homogeneous  G«s*raiited  Cplinder 

A  is  a  line  segment  in  3-space  called  the  Axis  or  spine.  It  is 
paramrterised  it  t,  and  an  s-coordinate  rr.ap  be  defined  coinciding 
with  the  Axis,  a  is  the  (constant)  angle  e*  each  croaasection  plane 


to  the  Axis.  C  is  the  planar  cross  sect iru  curve.  Coordinate*  a  aad 
v  map  be  defined  for  the  reran  section  piaau,  s—h  that  the  *■ 
coordinate  coincides  with  the  projection  of  the  Axis  onto  the  crons 
section  plane.  C  map  then  he  psrameterixed  in  t:  C(*)  ■» 
(a(tp  r(t)).  r(a)  it  the  radian  fsaction,  which  gives  the  scale  of  the 
crons'  section  at  each  point  along  the  Axis  (C  gives  the  shape  of  tan 
cross  section,  r  gives  the  scale).  So  in  eav-epace,  each  poist  on  the 
•ufacs  it  gives  is  terms  <?  parameters  s  aad  t  hp  (a,  r(s)n(t), 
r(»)v,t))  A  mataailp  orthogonal  set  map  he  farmed  hp  replacing 
the  s-coordisate  with  a  w  coordinate  perpendicular  to  the  Artis  sad 
the  e-roc  rdi sale.  Then  is  surr-spacs,  the  point  on  the  carfare  given 
hp  parameters  a  aad  t  it  (a  +  r{s)n(t)ea*  a,  r(sjyt  jus  a,  r(*)v(t)).  Is 
this  paper,  it  it  assn  mad  that  the  Axis  function  A(t)  it  Masv,  and 
that  the  radian  reaction  r(a)  and  cross  tsetses  ft  action  C(l)  art 
piecewise  C *. 

As  SHCC  it  a  Right  SHCC  if  ito  cross  section  angle  «  —  »/l 
Otherwise  it  is  an  Obliqwe  Sri  SC  An  SHCC  it  Liacc*  if  ito  rad- an 
fnactioa  r(s)  it  liaesr.  The  Slant  Theorem  (Shafer1,  pug*  109  aad 
Appendix  E)  stab.*  that: 

As  Oblique  Straight  Homogeneous  Generalised  Cplinder  (SHCC) 
has  ts  equivalent  Right  SWGC  if  sad  oalp  if  tbs  radian  reaction 
of  the  Obliqse  SHCC  is  Linear.  (Two  otherwise  equivalent 
descriptions  which  have  vfTrrvotlp  sloped  ends  art  regarded  as 
equivsleat  for  the  purposes  of  th  j  theorem). 

The  theorem  ass  proves  under  the  restricted  conditions  list  the 
same  radius  function  sad  tame  Axil  bs  aaed  for  both  the  Oblique 
and  the  Right  SHCCa.  *ke  question  arises  whether  the  theorem 
still  bolds  when  these  conditions  era  relaxed. 

fhe  ’same  radius  function*  eonditioa 

The  Slant  Toeorem  still  bolds  whan  the  ‘tame  radius  faaetioa* 
condition  is  removed.  Tht  'if*  part  of  the  theorem  ('Linear  radius 
functioa  implies  the  exialeace  of  as  equivsleat  Right  SHCC*)  is 
alreiAp  true  from  the  restricted  form  of  th*  theorem.  So  what  must 
he  proves  is  the  following: 

Gives  an  Oblique  SHGC  G  *•  (A.Cs.a)  where  radius  functioa  r 
is  nor-linear  there  does  not  exist  asp  Right  SlIGC  G*  ■■ 
(A.C*.r*,»/2)  which  has  the  same  Axis  as  G  (without  restrictioa 
on  the  radius  function  r*  of  G*). 

Proof:  The  basic  idea  is  that  a*,  least  one  of  the  angled  cross- 
sections  of  the  Oblique  SHGC  will  be  on  a  non-linear  bend  in  the 
radius  function  rf*).  But  the  bend  must  be  spread  over  a  wider 
range  of  cross-sections  in  the  Right  SHCC,  and  there  is  no  wap  for 


oh  radios  function  to  consw.intly  handle  all  of  them. 

Glees  sa  Oblique  SHGC  G  «■  (A.Cj.o),  the  ‘tigiag*  eosstnictioa 
shall  be  defiord  sa  follow*  for  s  tilue  of  i  «  s,  *nd  rsjues  of  t  ™ 
t,  ud  t  —  t^  (see  Figure  2).  Cad  the  point  tires  by  t  «  s,  sad  t 
“  tj_,  L»,  similarly  for  M,  Working  is  swe-space,  the  potato  may 
be  displayed  is  a  plot  of  w  f|LUt  s  (see  Figure  2).  The  coordinates 
of  L,  in  swr-tpaee  are  (*,  +  rtsjMh.)- °*  “•  f(*qMtL^i«  a, 
^MtJ). 

The  set  of  points  (s  +  rfaMt^Jcos  a,  rfsjujt^^is  a,  K’MbJ!  lor  si1 
s,  forms  a  curre  in  sw*-sps;e,  call  it  ‘ccree  L*  ilikewise  *coree 
M*).  .Take  the  plane  In  swr-spa-e  pe'pend  'ular  to  the  Axis  which 


Figure  Si  The  tigiag  cohVvum 

contains  L,.  Call  the  intmectioa  of  that  plsae  with  care*  M,  point 
M,  Call  the  s-ralue  for  that  poist  *,.  Now  late  the  plane  which  is 
at  an  angle  a  to  the  Axis  sad  cousins  M,.  C*D  the  iaterwetioe  of 
that  plane  with  cunre  L,  point  L,.  Similarly,  work  is  the  other 
direction  to  define  «,,  L,.  aad  M,  (see  Fignre  2,  which  plots  only  the 
w  snd  s  coordinates). 

(For  some  SKGCt  sad  eaJe-i  of  s  sad  t,  it  may  he  that  the 
htersectioa  of  thr  curse  L  aad  the  plane  ia  iwr-spare  may  isclade 
more  than  one  point,  o'  eeei  a  line  (bet  not  lesa  than  oat  poist).  la 
such  cases,  it  is  fsirly  essy  to  tee  that  tU  the  cro»- sections 
perpendicular  to  the  Axis  ran  sot  hart  the  same  shape,  is  which 
case  no  Rig.i  SHGC  caa  he  constructed  which  is  bomtgeicous,  ud 
the  theoitm  is  satisfied.  So  in  what  follows  it  will  be  assumed  that 
each  point  ia  the  Oblique  cross-section  maps  to  exactly  one  point  ia 
the  Right  cross-section.) 

Since  r<s)  is  non-linear,  there  exists  some  “slue  t  —  tj,  some  <  >  0. 
and  some  real  ralue  m,  such  that  for  all  a  ia  (a,  -  tXj),  the  slope  of 
rial  is  less  (greater)  than  si,  U.J  for  all  s  ia  (a,.*,  e  it,  the  slope  of 
rfs)  is  greeter  (,ess)  then  m. 

It  is  erident  that  by  rhoxainr  aad  t^  close  enough  together,  L, 
and  eta  be  chosen  with  u(t, )  snd  ujt^j  close  enough  together  eu 
that  the  tigiag  construction  caa  be  made  with  a,  in  (*,  -  <4,).  and 
s,  in  (v.s.  +  t).  Further,  this  can  be  done  so  that  u( j  aad  u(tM) 
are  equal  neither  to  each  other  nor  to  0,  brrause  SHGC  G  is  a 
closed  figure.  (As  an  example  of  a  non-closed  figure  which  does  not 
satisfy  this  theorem,  lahe  as  the  cross-section  s  ’.me  segment  parallel 
to  the  e-coordinate,  and  any  reasonable  non-linear  radius  function). 

From  the  way  in  which  this  tigiag  construction  has  beta  done,  it  ia 
clear  that  the  slopes  of  line  L,L,  and  line  L^L,  are  not  equal 
(likerise  for  line  and  line  M.M,). 

Lemma:  The  slopes  of  line  L,L.  snd  line  LX,  are  equal  (likewise 
for  line  M,M*  and  line  M.M,)  if  and  only  if  r(s,)/r(s,)  «■  r(u)/r(u,j 


Proof:  Using  (s,w)  coordinate*  (and  igtorinp.  .he  r-coordiaste),  it 
caa  be  seen  from  the  way  in  which  the  tigiag  construction  is  done 
that 

t-i  “  («i  +  rfSjMtJeos  o,  r(*,)ii(h.)“»  o). 

M,  «•  (a,  d-  ^Mtyjenn  o.  rfaX^Jnin  <r) 

L,  -  (s,  +  r<s,)u(4>on  <*.  r<»3)u(tL)ua  a). 

But  also 

L,  —  (»,  ♦  «.  “)• 

So  the  slope  of  lion  L,L, 

-  lr(a,WH>“  •  •  rt*eWH>“  “I 

/  K»1  <■  '(•XhX*  <*)  -  (»I  ♦  “I 

-  u.  O  U  - 

Likvwue  it  m  be  shows  thnt  the  slope  of  line  LjL, 

-  tu  o  HH.)/(s(ttV^tM))|  |1  -  «a,)/r<s,))| 

And  the  res  sit  follow*  (with  the  taue  argtmest  for  M,)L,  sad 

*,)• 

Now  ssisg  the  Lemma,  we  get  the  malt  that  rfs,)/^) 

Cut  if  G*  were  s  ealid  SHGC.  with  iu  re-Sws  sad  rroun-sectios 
fuactioss  r*(s)  sad  C*(t).  the  following  wot‘1  sold: 

-  r'fa^u*^*)  /  r*(s„)u*(tL*) 

-  r(a,)u(ta4)“»  "»  /  »(»sW‘i>»  « 

where  .be  middle  equality  U  dun  to  the  ‘ Homogeneous*  part  of 
•SKGCT*  (all  'roan- sec tioss  must  hare  the  sane  then*.  sp  to  seals). 
These  eq  sluice  would  imply  that  r(st)/r(a,)  —  r(«j)/K»,).  5i.es 
this  has  bees  shows  to  be  false,  no  eqmsslent  Right  SHGC  cu 
exist. 

The  *ssme  axis*  ccnditios 

The  theorem  doe*  not  hold  if  the  *lan.»  Axi«*  restriction  is 
rrmored.  If  difTerest  axe*  are  permitted,  then  there  are  non-U  near 
TSHGCs  that  bare  different,  equixalent  SHGC  descriptions.  The 
sphere  is  t  trieiai  counter-example  that  will  sot  be  considered,  since 
iu  alternate  descriptions  differ  only  by  rotstlos. 

But  'here  are  sos-trixial  counter-examples:  Consider  a  right 
ellipsoid  with  eenler  at  the  origin  is  Cartesian  3-ipect.  It  can  be 
represent'd  in  equation  form  as: 


I 

* 

ft 

I 

I 


9 


I 


i 


* 


x2/t?  +  jJ/bJ  +  iJ/c*  ■»  I 

Thinking  is  terms  of  generalised  cylinder*,  ud  tsJung  the  x-*xu  u 
the  Axis,  we  bite  s  Right  Non-linear  SHGC,  with  elliptical  croee- 
tectiona. 

Now  suppose  we  slant  the  Axis  ty  as  angle  a  is  the  x-y  plane,  hat 
leave  the  elliptical  croat-sections  parallel  to  the  y-s  plase  (a  kind  at 
•kew  trans  ormation).  This  ‘o’jiique*  Tig  me  is  clearly  as  Oblige 
Non-linear  SHGC,  again  witk  elliptical  croaa-eectiona.  This  *>laat* 
transformation  can  be  carried  out  in  th  equation  representauon  by 
replacing  y  with  y  •  x  tan  o  and  rearranging  ho  get: 

x*(  1/a*  +  taa’a  /  b*)  -  xy(2  tan  o  /  b*)  -f  y */b*  ••  t  -  i^c5 

Analytic  geometry  teats  shew  that  the  left  side  is  the  equation  of  a 
family  of  ellipses  tkv  bate  beet  rotated  in  the  x-y  plan*  by  as 
angle 

9  «•  (1/2)  arctanf}  tan  a  /  (1  -  b*/a*  -  ta^jj 

These  ellineea  are  ceatered  oa  the  a-axia,  sad  it  ia  easy  to  show  that 
their  orentation  end  eccentricity  in  independent  of  the  mine  of  i. 
They  til  nsec  the  aame  shape.  So  this  ‘obligee*  figure  may  be 
represented  as  a  Right  Non-Linear  SHGC,  with  Axis  os  the  I- axis, 
and  elliptical  c rose- section*. 

This  type  of  result  ia  not  limited  to  ellipsoids.  But  the  ellipaoid  has 
ibis  additional  property:  the  'oblique*  figure  is  simply  another 
right  ellipsoid,  rotated  from  the  x-axia  by  the  aagle  9  given  above. 
If  the  rotation  by  9  i»  carried  out  on  the  equation  rtpeseatalioa, 
the  result  ia: 

(x*/»5)(coe!d  +  (i,/b*)(Ua,<»eoe,fl .  2ua«veoadbi»jJ  + 

+  (yJ/h*)|coa*tf  +  2taaaecsAii4  +  taa^atia1^  +  (b*/a*)un*fl 

+  (i7‘*) 

-  1 

The  eccentricity  is  different  from  that  of  the  original  right  ellipaoid, 
as  we  would  expect. 

So  the  ‘oblique*  .igurt  era  also  be  repictented  aa  a  Right  ao*- 
Linesr  SHGC  with  the  Axis  ia  the  x-y  plase  et  angle  9 ■  Thee 
*t  ing  s  right  tllipsoid*  ia  s  non-Linev  property  of  SHGCi  which  it 
invariast  under  skew  transformation*.  To  pst  it  taother  way,  then 
ia  no  such  thing  aa  aa  oblique  ellipsoid. 

3 

It  is  interesting  that  while  the  t-axia  representation  depends  on 
being  able  U>  hike  advantage  of  the  freedom  to  orient  the  Axis 
anywhere  in  three  dimensions,  the  ‘angle  9“  representation  niao 
works  a*  a  counter-example  to  the  two-dimensional  analog  at  the 
Slant  Theorem. 

So  then  an  some  Non-linear  Oblique  SHGCe  which  an  equivalent 
to  Right  SHGCs,  and  therefore  (he  ‘only  if*  part  of  the  Slant 
Theorem  does  not  bold  without  the  'tame  axis-'  eoadiCoa. 


SHGC  description  for  that  shape.  Aa  RH-axis  ia  aa  Axis  for  some 
Right  SHGC  deseriptioB  Shafer  ia  bin  Pivot  Theonm  (!,  p.  IQS 
and  Appeadix  F)  baa  deacubed  families  of  H-axcs  which  all  aat  the 
same  cross- sections,  which  exist  only  for  Linear  SHGCs  Other 
claasea  of  shapes  that  have  multiple  H-axes: 

1.  There  if  aa  H-axis  lyng  in  the  x-y  piaae,  and  the  equation 
representation  for  the  ehape  can  be  written  in  the  form 

fl*j)  -  «<r) 

and  f  satisfies 

ftkx,ky)  —  h<k)flxjr)  for  tome  function  It 
Then  the  a-axia  it  aa  RH-txia.  For  example: 

(x/up+ty/br  +  fn/c)*..! 

1  The  crwe-eectioe  of  n  Right  SHGC  Itself  has  multiple  H-axta  ia 
iu  piaae.  For  example,  a  tquan  bea  four  H-axea,  a  regelar 
pentagoa  five,  sad  a  circle  iaflaity.  Aay  radios  faactioa  caa  be 
used.  Then  also  talLfy  tht  property  of  the  pnrioaa  type  above. 
Included  hen  could  be  spheres,  cyliidcn,  right  prisma  witk 
polygonal  hues,  tetrahedra.  oetahedra. 

3.  v  ariose  eloagatioas  sari  skews  of  the  tint  two  types.  Included 
hen  would  be  oblique  prisms  with  polygoaal  baaaa  sad  imgnlar 
letrahedroaa.  The  ellipsoid  caa  be  ana  aa  an  eloeeated  sphere. 

Rznxokzs 


jlj  S.  A  Shafer,  SL.  'v»  Ctamttry  sad  Oerisding  Contours  •/ 
Gen  well  red  Cpfiiherv.  PhD  darertatioa,  Caraegie-MeUoa 
Uaivenity.  May  1M3. 

O 


This  naeareh  was  supported  hy  OARPA  eoatrset  NOOUSQ-gg- 
C-018S  sad  by  aa  AT  AT  Bell  Laboratories  graduate  scholarship. 


Families  of  descrirtiotn  with  different  axee 
Define  an  H-axis  for  a  shape  as  a  line  which  is  th*  Axis  for  tome 


*  % 


r. 


5Cj 


r  r  /<■ 


( .f Q  70/95  -OH 


The  Calibrated  Imaging  Lab  Under  Construction  at  CMU 

Steven  A.  Shafer 
Computer  Science  Department 
Carnogie-klt'on  University 
Pittsburgh,  PA  15213 
5  November  1985 


Abstract 

This  document  describes  the  Calibrated  Imaging  Laboratory,  a 
lacility  lor  precision  digital  imaging  under  construction  at  CUU. 
The  purpose  of  this  lab  is  to  provide  images  with  accurate 
knowledge  about  ground  truth  (concerning  the  scene,  illumination, 
and  camera )  so  that  computer  vision  theories  end  methods  can  be 
tested  on  real  images  and  evaluated  to  detemiine  how  accurate 
they  really  ire.  The  lab  aims  to  provide  ground  truth  data  accurate, 
i  -  the  best  circumstances,  to  the  nearest  pttc  geometiically  and 
the  nearest  8-bit  pixel  value  photometrically.  There  are  also  many 
illumination  and  imaging  facilities  m  the  lab  that  provide  increased 
flexibility  or  increased  complexity  of  the  visual  situation,  at  a  cost  of 
reduced  precision  in  the  ground  truth  data. 

To  accomplish  these  goals,  the  lab  includes  mechanisms  id 
carefully  control  and  measure  the  direct  and  indirect  illumination  in 
me  scene,  the  positions  of  obterts,  and  the  properties  o t  the 
camera/digitizer  system.  Lighting  can  be  provided  by  a  near-point 
source  (5  mm  diameter  aperture)  lor  high  precision,  or  by  a 
ger.erilpurpose  track  lighting  system  lor  flexibility.  The  work  area 
can  lo  suri  '  inded  by  black  curtains  etc.  to  reduce  stray  light  and 
indirect  illumination.  The  cameras  include  a  very  high-precision 
CCD  camera  on  a  static  mount,  and  an  X-Y-Z-pan-tilt  fig  with 
multiple  inexpensive  CCDs  aligned  with  each  other.  Surveyors' 
transits  are  used  to  measure  positions  of  points  in  space,  and  other 
calibration  materials  are  available  for  all  types  of  camera  property 
measurement  Color  imaging  by  serial  selection  of  filters  is  Jiao 
available. 

The  lab  is  described  as  we  currently  envision  it  will  be  equipped 
when  the  facilities  are  operational ;  the  current  statu:  is  summarized 
at  the  end  of  the  paper. 

1.  Introduction 


Computer  vision  research  encompasses  both  thco  etical  snd 
empirical  studies,  but  there  has  been  a  mismatch  between  the  two. 
Current  theories  typically  depend  upon  unrealistic  assumptions, 
making  it  impossible  to  test  them  on  real  images.  Because  they  are 
only  tested  on  synthetic  data,  most  theories  cannot  be  analyzed  to 
determine  exactly  what  the  sources  of  inaccuracy  and  error  might 
be  in  analyzing  real  m  ages  Likewise,  empirical  studies  using  real 
images  usually  cannot  apply  existing  theories,  so  they  rely  on 
guesswork  and  heuristics  instead 

We  are  building  a  Calibrated  Imaging  Lab  (CIL)  at  CMU  to  address 
both  aspects  of  computer  vision  by  bridging  the  vap  between 
theories  and  real  images.  The  OL  is  a  facility  for  capturing  images 
wilh  hign  precision  m  a  controlled  environment.  It  will  make 
possible  the  study  of  botn  geometric  and  photometric  computer 


vision  theories  with  high-fidelity  images  and  good  knowledge  of 
ground  truth. 

High  preceion  images  are  important  in  computer  vision  studies 
because  they  make  .1  oossib*e  is  evaluate  the  results  of  numeric 
algorithms  such  as  stereo  or  motion  analysis.  With  precisely  known 
imaging  geometry  and  object  locations,  it  «  possible  to  look  at  the 
output  from  an  algorithm,  and  tell  how  accurately  it  performed 
rather  than  relying  on  a  coarse,  subjective  evaluation.  It  is  also 
possible  to  pinpoint  sources  of  systematic  error  or  bias  that  are  too 
subtle  to  be  identified  when  the  ground  truth  or  imaging  geometry 
are  unknown  or  imprecise. 

Control  of  the  imaging  environment  is  also  important  especially 
for  photometric  analysis  of  images.  Photometric  methods  are 
iiotorious  for  depending  upon  assumpti  ns  about  the  lighting, 
background  illtiminaiior,  and  reflections,  and  camera  pixel  vrlues 
that  are  rarely  found  in  practical  situaions.  In  fact  the  assumptions 
usually  made  in  such  research  are  extremely  difficult  to  match  in 
practice,  which  may  help  to  explain  why  so  few  photometric 
methods  have  been  successfully  applied  to  real  images.  In  order  to 
evaluate  current  theories  and  develop  new  theories  based  on  more 
realistic  assumptions,  it  is  necessary  to  have  control  over  the  direct 
and  indirect  illumination  in  the  image  and  to  know  the  precise 
photometric  response  of  the  camera. 

In  the  Calibrated  Imaging  Laboratory  (OIL)  at  CMU  wc  are 
addressing  both  the  need  for  precise  images  and  the  need  for  a 
controlled  environment.  For  precise  imaging,  the  CIL  has  the  goal 
of  providing  ...  _ges  accurate  geometrically  to  the  nee'ast  pixel 
ioca‘  an  and  photometrically  to  the  nearest  °  bit  oixel  velue.  For 
controlling  tne  environment,  the  Cll  has  the  g  ot  providing  a 
continuum  ol  imaging  situations  ranging  from  a  (near  jpent  light 
source  with  no  background  illumination  and  a  static  high  precision 
camera,  to  commercial  lighting  fixtures  with  bright  walls  and  a 
movable  pla'form  for  several  low-cost  cameras.  A  range  of 
complexity  can  thus  be  achieved  in  the  CIL  along  several  different 
dimensions  of  the  imaging  situation. 

It  is  our  intention  to  usu  the  CIL  for  several  purposes: 

•  For  testing  existing  theories  for  image  understanding  in 
these  areas: 

o  Shadow  geometry  and  other  geometric  snape 
inference  methods 

o  Stereo  (using  two  or  more  cameras),  motion,  and 
combined  motion  stereo  analysis 

o  f-hctcmetric  ano  reflectance  map  methods  lor 
shape  recovery 

°  Color  analysis  of  gloss  and  surface  material 


509 


1 


•  For  producing  new  theories  dealing  witn  more 
complexity  (and  hence  more  realism)  in  these  areas: 

o  Camera  distortions 

o  Non-uniform  illumination  and  extended  light 
sources 

3  Substantial  oaGKgrcvnd  (diifuce)  illumination 

=  Glossy  a;io  other  non-Lambertian  surfaces 

•  For  evaluating  the  properties  of  commonly  available 
cameras  used  for  machine  vision. 

•  Fo'  providing  access  io  high-precision  image  data  to 
interested  researchers  at  CMU  and  elsewhere. 

1.1  Overview 

This  informal  paper  describes  the  facilities  of  the  OIL  These 
facilities  include: 

•  Lighting  Control  provided  by  a  new-point  light  .'ounce 
(arc  lamp)  for  precision  shadow  analysis,  and  a 
complete  track  lighting  system  tor  flexible  general 
illumination. 

•  Background  Reflection  Control  In  a  room  with  black 
ceiling,  black  carpet,  and  black  or  white  curtains,  with 
other  colored  backdrops  as  needed. 

•  High  Precision  Color  images  provided  by  a  custom- 
built  camera  yielding  512x512x6  images  with  each  pixel 
value  being  repeatable  (noise-free)  and  linearly  related 
to  scene  radiance,  using  color  filters  in  a  filter  wheel. 

•  Precision  Stereo  and  Motion  Image  Sets  provided  by  a 
mobile  platform  wilh  precision  X  Y  Z  pan-tilt  controls 
and  a  pair  of  CCD  cameias  aligned  for  stereo 
correspondence. 

•  Objects  including  simple  obiects  for  viewing  and  a 
scale  model  landscape  that  presents  a  variety  ol 
surface  property,  motion,  and  occlusion  situations. 

•  Cahbra'tcn  Data  provided  by  appropriate  tools, 
including  photometers,  precision  targets,  and 
calibration  camera  filters. 

t  Accurate  Ground  Truth  Data  provided  by  an  optical 
table  with  precision  position  control  devices  and 
surveyors'  transits  for  position  measurement 

Although  the  paper  is  written  in  the  present  tense,  the  CIL  is  not  ye< 
in  complete  existence.  A  status  report  on  the  current  state  of  the 
lab  is  included  in  this  paper. 

2.  Lighting  Control 

Figure  2-1  shows  the  overall  layout  oi  the  CIL.  The  "business 
area”  is  the  optical  table,  4x8  feet,  with  cameras  at  one  end 
looking  at  the  ‘scene"  at  the  other  end.  Lighting  is  provided  by  a 
track  lighting  system  overhead  or  an  arc  lamp  on  a  stand;  the  whole 
work  area  is  enclosed  in  curtains  to  eliminate  outside  light.  A  desk 
and  storage  area  complete  the  lab  (which  is  already  straining  the 
size  of  the  room!). 


Siora-.r 


Figure  L-1:  Layout  of  the  Calibrated  Imaging  Lab 

2.1  Direcl  Lighting  by  Normal  Fixtures 

Ordinary  illumination  for  imaging  is  provded  by  a  track  lighting 
system  in  the  ceiling.  The  track  layout  allows  many  configurations 
of  light  sources  which  may  be  directly  over  the  scene  or  cameia.  or 
at  virtually  any  angle  and  direction.  Vertical  positioning  is  provided 
by  wands  that  attach  to  the  track  and  hang  downward,  allowing 
fixtures  to  oe  mounted  at  approx,  l-loot  intervals  from  the  coiling 
height  down  to  about  4  feet  off  the  flour.  Each  fixture  itself  has  a 
gimbal  allowing  imprecise  pan  and  tilt  control  (except  the  fixtures 
for  the  spherical  “globe"  bulbs  which  simply  point  downward). 

We  have  a  variety  of  fixtures,  bulbs,  and  attachments  Of  primary 
interest  are  spot  and  flood  lamps  of  various  wattages  up  to  500  W, 
mountable  on  gimbal  fixtures  with  cowlings  or  reflectors.  We  also 
have  a  vauetv  of  grilles  and  (uncalibrated)  color  filters  for  lighting 
experimentation,  and  other  bulbs  such  as  globes. 

The  track  itself  has  four  circuits,  each  with  a  separate  switch  on 
\he  wall.  When  a  fixture  is  attached  to  the  track,  an  adaptor  is  used 
whose  position  and  swi.ch  setting  determine  the  circuit  that  will 
power  the  fixture  We  use  one  circuit  for  normal  ,  oom  lighting,  two 
for  alternate  imaging  illumination  configurations,  and  one  lor 
auxiliary  power  when  multiple  high  wattage  bulbs  are  to  be  used. 


510 


-••-V.'v.r.." 


We  also  plan  to  obtain  a  couple  o<  vertical  rectangular  light 
panels  for  studying  light  sources  of  various  shapes,  and  we  can 
easily  obtain  fluorescent  tubes  to  hang  from  the  track. 

2.2  Direct  Lighting  by  Point  Source 

High  precision  illumination  control  is  provkjed  by  a  "point 
source",  an  arc  iorr.p  with  a  very  small  aperture.  This  tamp  was 
configured  with  the  following  specifications  assuming  the  scene  s 
1  m  deep  and  s  3  m  from  the  source: 

•  Shadow  edges  on  surfaces  lacing  the  camera  should 
be  no  more  than  one  pixel  (about  1  mm)  wide 

•  The  center  of  the  illuminated  sigit  should  receive  1000 
lux  (100  toot  candles)  o.'  light,  equivalent  to 
recommended  office  illumination  for  reading. 

•  The  spot  should  ta  at  least  4  feet  wide,  with  at  fsssl  500 
lux  at  the  edge  of  the  spot  (Uniform  illumination  would 
be  nice  but  is  too  demanding  a  specification.) 

These  specifications  give  ran  to  the  following  requirements: 

•  The  aperture  must  be  less  than  1  cm  in  diameter. 

•  The  lamp  must  be  a  1000  W  arc  lamp. 

•  The  condensing  lens  must  have  (  number  no  greeter 
than  2. 

We  havs  such  an  arc  lamp  system,  os  configured  (optically  and 
mechanically)  by  the  manufacturer  ft  consists  of  a  lamp  housing 
18  inches  h.gh,  with  a  lens  ard  precision  5  mm  aperture  in  front 
these  pieces  are  all  mounted  on  a  base  pla'e  9  x  20  in.  The 
assembly  weighs  about  20  lb.  and  nas  a  power  supply  attached  by 
several  cables. 

To  support  the  arc  lamp,  we  have  a  scaffold  6  feet  high  made  of 
pipes,  with  a  movable  platform  attached  to  the  base  piste  of  the  arc 
lamp.  The  platform  can  slide  vertically  horn  3  to  6  feet  above  the 
floor,  arid  can  be  tilted  within  t45°  from  horizontal.  The  scaffold 
can  ft  on  a  dolly  to  be  pushed  about  on  the  floor. 

L’nlor1  jnaieiy,  the  arc  lamp  has  some  associated  safety  hazards: 
hint  prt-.isure  of  the  lamp,  a  high-voltage  ignition  pulse,  ultraviolet 
ligir  emitted  by  the  lamp,  and  ihe  high  intensity  of  the  source  itself. 
We  hit  e  precautions  such  as  a  UV  filter  for  the  lamp  and  goggles  to 
wear,  and  a  key  sw.tch  for  the  power  supply,  but  use  nf  *he  src  lamp 
w  If  be  minimized  in  the  lab  because  of  the  hazards 

2.3  Diffuse  Illumination  Control 

Fnr  con  r.ii  ol  the  background  illumination,  we  would  like  a  totally 
non  reiicrtve  surrounding  In  practice,  we  must  accept  a  lather 
poor  solution  due  to  Ihe  cost  and  maintenance  problem  ol  such  an 
environmint  We  settle  lor  block,  relatively  matte  curtains,  a 
similarly  painted  ceiling,  and  a  caik  gray  carpet  We  also  have 
white,  lanly  shiny  curtains  as  an  alternative  lor  studying  high- 
reflectance  backgrounds.  We  can  always  use  backdrops  or  hang 
ether  fabrics  to  achieve  other  ellects. 

The  working  area  itself  consists  primarily  ol  components  painted 
black,  but  since  the  optical  table  lop  is  gray  metal,  we  may  need  to 
use  black  fabric  pieces  to  minimize  unwanted  reflections  horn  the 
table  top 

3.  Imaging  Facilities 

Figure  3-1  shows  the  layout  ol  the  working  area,  the  optical  table 
containing  the  cameras  and  "scene"  Ithe  obiects  being  viewed). 
The  optical  table  itself  is  a  4  x  3  looi  metal  table  that  is  extremely 
rigid,  with  a  precise  grid  ol  1  /4-20  threaded  holes  on  1  -inch  centers 
across  Ihe  enlire  surface.  Cameras  are  mounted  on  two  platlorms 


Figure  3- »:  Layout  of  the  Optical  Table 

on  the  table:  the  Mobile  Camera  Platform  (MCP)  for  high-precision 
camera  motion  studies  and  the  Static  Camera  Platform  (SCP)  tor 
high-precision  photometric  studies. 

3.1  The  Mobile  Camera  F-'atform 

The  Mobile  Camera  Platform  (MCPJ  is  shown  in  figure  3-2.  It 
provides  high-precision  X,  Y,  Z,  pan,  and  tilt  motion  for  a  platform 
12x12  inches  square,  with  the  rotation  (pan/ tilt)  center  placed  at  a 
point  7  munes  above  the  center  of  the  platform.  This  • 
accomplished  by  three  components:  an  X-Y-Z  fig,  a  pan/tilt 
assembly,  and  position  control  for  each  camera  mount  placed  on 
the  platform. 

The  X-Y-Z  jig  is  constructed  from  long-travel  translation  stages 
with  crank  controls  and  mechanical  (odometer-stvle)  position 
readouts  to  1/1000  inch  resolution.  Since  a  pixel  (st  1  m  with  a 
normal  lens)  is  roughly  1/50  inch  across,  this  is  more  than 
adequate  accuracy.  The  assembly  has  a  36  x  36  it.ch  base 
providing  24  x  24  inch  travel  horizontally,  with  a  verticil  'picture- 
frame'  assembly  32  incites  (high)  x  38  (wide)  by  6  (thick)  on  the 
front  edge  of  the  base  that  moves  vertically  through  12  inches  of 
travel. 

Pan  and  tilt  are  controlled  by  high-precision  manual  rotation 
stages  with  direct  measurement  to  .02°  (about  1  arc  minute)  of 
resolution  and  accuracy,  with  the  ability  to  point  in  any  direction 
(360°  pan  and  100°  tilt). 

Each  camera  has  a  mount  with  X-Y-Z-pan  adjustment  r  ’  top  of 
the  camera  platform.  This  allows  fine  alignment  of  the  cameras  so 
the  centers  of  perspective  expansion  can  be  placed  in  the  plane  of 
ihe  pan  and  tilt  axes  and  the  optical  axes  of  the  cameras  can  be 
placed  normal  to  this  plane.  With  appropriate  alignment,  epipolar 
lines  can  be  made  to  be  scan  lines.  X-YZ  translation  will 
correspond  to  image  and  depth  translation,  and  pan/tilt  rotation 
will  not  Induce  any  translational  components.  Roll  or  hit 
adjustments  cun  be  inserted  into  each  camera  mount  if  needed. 
Alignment  ol  Ihe  various  positicnmo  stages  will  be  a  complex  task, 
but  we  have  the  necessary  degrees  ol  freedom  and  viewing  targets 
if  only  we  supply  the  patience. 


’  ,  N 

r.’«-  t. 


v  ‘ 


511 


HD  i  m 


Figure  3-2:  The  Mobile  Camera  Platform 

We  have,  however,  noted  a  problem  with  alignment  of  cameras  as 
described  above:  when  the  focus  of  a  typical  camera  lens  is 
adiustcd.  the  center  of  perspective  expansion  moves  along  the 
optical  axis,  The  cameras  must  then  be  re  positioned  along  tho 
viewing  direction  if  tho  perspective  centers  are  to  be  kept  in  the 
plane  ol  the  pan  and  tilt  axes.  This  problem  does  not  exist  at  all  for, 
say,  outdoor  vision  with  wide-angle  lenses  fixed  at  infinite  focus,  it 
only  arises  where  close-up  photography  imposes  depth  of-fiekf 
restrictions  that  require  changing  the  locus  of  the  lenses  from  time 
to  time. 

There  will  be  anywhere  from  one  to  four  cameras  mounted  at 
any  time  on  the  MCP.  Normally,  we  expect  to  use  a  pair  of  very 
inexpensive  CCD  cameras,  with  local  modifications  to  the 
elec'romcs  that  provide  about  6  bits  per  pixel  of  photometric 
repeatability  with  excellent  geometric  fidelity.  We  will  measure  the 
photometric  linearity  ourselves  if  needed.  Video  facilities  are 
shown  m  figure  3  3;  they  include  one  monitor  tor  each  camera, 
a  manual  crosspoint  video  switch  with  inputs  I'om  the  cameras 
on  both  the  Mobile  and  Static  Camera  Platforms  and  outputs  to  a 
video  patch  panel  for  digitizing  on  any  of  our  VAXes,  and  to  a  SUN 
with  one  or  more  digitizing  frame  buffer  cards.  We  have  a  variety  of 
lenses  on  hand. 

The  current  MCP  configuration  has  manual  translation  and 
rotation  stages  on  the  jig  and  a  manual  video  crosspoint  for 
f  digitizing,  we  have  considered  motorizing  the  entire  facility.  If  we 
undertake  real  time  tasks  in  the  lob  in  the  future,  there  might  be 
justification  for  such  a  considerable  expense.  By  replacing  manual 
with  motorized  components,  we  could  obtain  translation  to  .001 
inch  resolution  at  3  m/sec.  and  rotation  to  .01°  resolution  at  30 
deg/sec.  Digitization  migh*  be  wired  up  with  parallel  lines  for  real- 
•  time  stereo. 

3.2  The  Static  Camera  Platform 

In  addition  to  ihe  Mobile  Camera  Platform,  there  is  a  Static 
Camera  Ptattorm  (SCF)  intended  for  high-precision  photometric 
control  The  SCP  itself  is  simply  a  2  x  4  loot  platform  elevated  20 
inches  above  the  optical  table,  occupying  one  end  of  the  table.  The 
2- foot  depth  will  allow  virtually  any  combination  ol  color  filters. 


polarizers,  etc.  to  be  placed  in  front  of  the  cameras  on  the  SCP. 

The  featured  camera  on  the  SCP  is  a  high-precision  camera 
custom-built  for  us  to  the  fo'iowing  specifications: 

•  The  Output  image  is  51 2  x  512  x  8  bits,  with  square 
pixels. 

•  Repeated  images  of  toe  same  scene  should  produce 
the  same  pixel  value  at  every  location  ir  the  image  fi  e. 
all  8  bits  should  be  noise  free). 

•  Pixel  values  soould  be  hr, early  related  :o  radiance  .n 
the  scene,  and  an  image  of  a  uniformly  radiant  scene 
should  produce  the  same  pnel  value  at  every  location 
(except  a  small  number  of  known  b'emishes). 

We  do  not  require  that  the  image  be  blemish -free,  nor  that  it  be 
geometrically  accurate.  However,  the  blemishes  must  be  few  in 
numbe'  and  at  measurable  pixel  locat  ons,  and  the  geometric 
properties  must  be  stable  over  time.  We  w>T  also  allow  the  "image" 
to  be  the  result  of  some  constant-time  processing  upon  the  camera 
oulp'.'t,  ir.  particular  for  photometric  linearity  calculations  which  are 
based  On  constants  th?t  are  unique  for  each  pixel  location. 

The  camera  we  have  is  based  on  techaoiooy  used  for 
astronomical  cameras.  Our  supplier,  an  engmeehng  firm,  bc't  a 
camera  which  differs  from  standard  CCD  CCVV  can-eras  in  several 
ways: 

•  The  sensor  chip  is  very  high  quality. 

•  The  sensor  chip  is  in  an  electric  cooler  just  below  3°F. 
which  'educes  thermal  noise  to  the  required  level  .'or 
10  bit  noise-free  digitization. 

•  The  scanning  rate  of  the  CCD  is  slow  (5  frames/sec, 
non-interlaced)  to  keep  a  high  signal-to-noise  ratio. 

A  digitizer  of  appropriate  quality  is  attached,  and  the  output  is  'ed 
into  an  IBM  PC  by  DMA  to  the  PC  memory.  The  10  bit  image  da*a 
is  then  transformed  into  an  8-bit  image  by  a  photometric  correction 
calculahon,  transforming  each  pixel  value  by  a  linear  scaling  with 
the  coefficients  tor  each  pixel  stored  on  d.sk  on  the  PC.  The 
engineering  firm  provided  the  photometric  calibration  data  or.  a  PC 
disk  based  on  measurements  made  at  its  shop.  The  camera  head  is 
about  a  six-inch  cube,  with  a  standard  C-mount  for  lenses:  there  is 
a  separate  power  supply,  and  the  digitizer  occupies  two  cards  in 
the  PC. 

We  have  a  lull  set  ol  color  filters  lor  both  color  separation  (R-G- 
B)  and  bandpass  liiter  work.  These  alters  can  be  individually 
mounted  in  front  ol  the  lens,  or  mounted  on  a  4-position  filter 
wheel  used  for  the  high  precision  camera.  We  nave  in  filters  on 
hand  lor  use  with  uncorrecteo  CCf1'  although  many  CCDs  today 
for  closed-circuit  TV  use  have  fcu..  ters  to  correct  the  spectral 
responsivity  to  be  c'ose  to  the  nunr,„  .pectral  luminous  efficiency 
curve  V(A).  These  filteis  are  available  for  use  on  the  Mobile  Camera 
Platform  as  well,  although  space  limitations  on  the  MCP  probao'y 
make  it  mandatory  to  ,-se  individual  filter  holders  instead  ol  filter 
wheels. 

We  can  mount  other  cameras  on  the  SCP  at  will.  Using  Ihe  high- 
p-ecision  camera  as  a  standard  of  measure,  the  performance 
parameters  of  other  cameras  can  be  determined  in  highly 
controlled  equations.  This  may  give  us  data  lor  field  use  of  the 
same  (less  expensive)  cameras,  and  wilt  allow  us  to  make  more 
informed  camera  purchasing  decisions  in  the  future 


512 


SUN  VAX  Digitizer 

Dial tlieri  Selector 

PAICA  PtMl 


Ethtrnat  RGB  Frc»  AGb  Fro* 

S'JN  Ditplijr  vai  Display 


Flgu  re  3-3:  Video  Facilities  in  the  OIL 


4.  Other  Facilities  Related  to  Imaging 

In  the  lat>  are  a  color  monitor  and  a  terminal,  a  monochrome 
monitor  lor  each  camera,  the  video  crosspoint  described  above  and 
shown  in  figure  3-3,  and  outgoing  wires  to  the  VAX  computer 
network  i.t  the  CMU  Computer  Science  Department  and  the  SUN 
and  IBM  PC  used  foi  digitization. 

On  the  VAXes  and  SUNs  we  have  a  lull  library  of  pixel  access 
and  image  manipulation  software  written  in  C:  these  machines 
run  the  UNIX  oper-  g  s-stem.  LISP  interfaces  will  be  written 
under  the  auspices  of  the  ALV  project. 

We  are  writing  a  program  called  the  Geometric  Calculator  that 
acts  like  a  combination  desk  calculator  and  database  for 
manipulating  geometric  entities  such  as  points,  lines,  vectors, 
planes,  rotations,  and  coordinate  frames.  The  basic  operation  in 
this  program  is  to  type  in  an  expression  and  the  program  will 
calculate  :he  result  and  print  it  out  or  store  it  with  a  name  for  later 
use.  For  example,  an  expression  might  be  "the  plane  passing 
througi  tmee  points:  the  point  named  X,  the  point  defined  by 
coordinates  (B.S.T)  in  coordinate  system  C:  and  the  point  formed 
by  the  intersection  of  the  line  named  L  with  the  line  through  named 
poin  P  in  a  direction  parallel  to  the  X-axis  of  coordinate  system  D". 
(Of  course,  the  notation  will  Lr  much  more  arcane  than  this!)  The 
Geometric  Calculator  program,  given  this  expression,  would 
evaluate  it  and  either  print  the  result  or  stce  it  with  a  name  given  by 
the  user;  collections  of  named  objects  can  be  stored  in  f.Ks  and 
later  recalled  for  further  calculation.  The  program  can  ctal  with 
Cartesian  as  well  as  spherical  coordinates,  which  are  neer.  J  for 
the  mensuration  facilities  described  below.  We  ma,  add 
mechanisms  for  perspective  projections,  possibly  including  some 
distortion,  so  that  pixel  location  coordinates  can  be  used  directly  as 
operands  to  the  Geometric  Calculator  program.  Objects  are 


represented  relative  to  coordinate  systems  which  may  move  relative 
to  each  other;  in  this  way,  when  a  coordinate  system  is  changed,  all 
the  cVects  and  coordinate  systems  defined  relative  to  it  will  also 
char.ga  (in  relation  to  worfo  coordinates)  automatically. 

In  addition  to  simple  objects  to  view,  presenting  a  variety  of 
textu.es,  colors,  and  glossiness,  we  have  a  scale  model 
landscape  built  with  N  guage  (1  /160)  electric  tiam  models  on  a  3  x 
S  loot  board.  The  landscape  is  fairly  complex,  with  a  variety  of 
simulated  ground  covers,  terrain  features,  railroad  track  features, 
and  buildings.  The  centra)  area  of  the  layout  allows 
in'erchangeable  plates  to  be  inserted  containing  different  types  of 
buildings  such  as  urban  buildings,  a  rural  scene,  an  oil  refinery,  or 
an  industrial  setting.  The  layout,  building  plates,  and  rolling  stock 
have  been  carefully  selected  to  yield  a  very  wide  variety  of: 

•  surface  textures  --  smooth  and  shiny,  smooth  and 
matte,  grainy,  or  with  resolvable  texture  patterns 

•  shapes  -  domes,  generalized  cylinders,  blocks,  sloping 
surfaces,  protrusions  of  various  types,  symmetries  and 
skewed  symmetries 

»  motion  -  motion  of  the  train  (usually  moved  by  hand  a 
precise  distance  between  frames),  of  fixtures  on  the 
train  (such  as  a  crane),  of  vehicles  or  fixtures  in  the 
background,  of  <  al  objects  at  once 

•  occlusion  ■■  tr«*in  crossovers,  parallel  tracks  and 
coplanar  tracks  at  angles  to  each  other,  tracks  running 
through  tunnels  or  bridges  with  frameworks,  vehicles 
passing  behind  buildings  or  poles 

This  layoo.  provides  a  very  rich  visual  environment  for  all  kimis  of 
geometric  imaging  studies.  There  is  also  a  great  deal  of  tiny  detail 
so  that,  for  example,  an  interest  point  operalor  looking  af  an  image 


513 


of  a  ooxcar  will  probably  find  dozens  of  feature  points  rather  than 
the  four  or  six  point'.,  that  would  be  found  in  a  simpler  domain  such 
as  the  blocks  world.  Unfortunately,  because  of  the  scale,  almost  all 
Hat  surfaces  will  be  made  nf  plastic  and  thus  unsuitable  for  st-uous 
studies  of  material  type  identification. 

5.  Mensuration  and  Calibration 

The  CIL  has  stringent  mensuration  (position  measurement)  and 
calibration  requirements  due  to  the  stated  goal  ol  providing  images 
accurate  geometrically  to  the  nearest  pixel  location  and 
photometrically  to  the  nearest  bit  ol  the  pixel  value. 

5.1  Mensuration 

The  CIL  must  provide  both  coarse  and  fine  measurement  and 
control  of  the  positions  of  obtests,  cameras,  and  lights  in  order  to 
meet  'ts  geometric  precision  goals.  The  coarse  position  of  objects 
is  controlled  by  the  optical  table  iiself.  which  has  a  grid  of  1/4-20 
threaded  holes  on  1-inch  centers  across  its  entire  surface  We 
have  a  collection  of  rods  10  use  for  supporting  objects  as  needed, 
with  accuracy  provided  only  by  rulers  and  levels  Fine  position 
control  can  be  obtained  by  optical-quality  rotators  and 
translators  to  move  objects  as  desired.  For  tie  track  lighting 
system,  we  have  ruled  scales  mounted  cn  the  track  itself,  but  such 
position, ng  is  still  rather  coarse.  The  arc  lamp  on  its  dolly  is  also 

only  posilionable  with  ruler  accuracy. 

*> 

However,  while  we  cannot  obtain  extremely  (me  control  ol  the 
position  of  everything,  we  can  and  do  measure  positions  to  great 
accuracy.  We  do  this  us  ng  a  set  of  electronic  theodolite * 
(surveyors'  transits)  mounted  on  the  corners  of  tl.e  optical  table: 
these  are  telescopes  with  crosshairs  on  pan/tid  mounts,  knobs  for 
line  aiming  of  the  scope,  and  a  digital  readout  ol  horizontal  and 
vertical  angle  of  the  scope.  The  theodolites  have  a-  accuracy  of  ?0 
arc  seconds  in  both  vertical  and  horizontal  measurements, 
providing  measurements  accurate  to  approximaiety  1/3  pixel  width 
at  working  distances  Our  theodolites  have  machine- readable 
output  jacks.  although  we  have  no  current  olan  to  interface  them  to 
our  computers  electronically  We  have  little  stick  on  bui'seyes  that 
can  be  used  to  a  eate  sighting  points  on  a  featureless  surface. 

It  requires  about  30  seconds  for  a  person  to  obtain  a 
measurement  ol  a  pom:  on  one  theodolite,  or  about  2  minutes  all 
told  to  measure  a  point  in  space  and  record  its  coordinates  to  the 
Geometric  Calculator  program  At  this  speed,  it  is  not  possible  it 
general  to  exhaustively  measure  a  complex  object  or  illumination 
setting  It  is  cur  plan  to  use  rulers  to  measu.e  obiects.  then  use  the 
theodolites  to  determine  exactly  where  those  obiects  are  placed  on 
the  table  tor  viewing  and  to  verity  the  most  critical  of  the  ruler 
ineasuremenis.  Another  problem  with  the  theodolites  is  that  thee 
minimum  focusing  distance  is  about  5C  inches,  which  means  that 
we  need  lour  theodolites  to  cover  the  entire  table  top 

5.2  Calibration 

The  CIL  will  undertake  several  kinds  u’f  calibration  to  ensure  the 
desired  accuracy  and  precision  goals  The  most  obvious 
calibration  neeu  is  Icr  determining  the  geometric  proiechon 
properties  of  the  cameras  in  use.  which  will  be  accomplished  using 
a  precision  on d  standing  vertically  up  trom  the  optical  lapie  Cur 
grid  is  75  x  75  cm.  with  lines  1  mm  thick  forming  1  cm  boms  The 
precision  ot  the  line  widths  and  spacmqs  is  0  1  mm  We  can  place 
the  grid  on  the  table,  measure  a  lew  points  wi!h  the  theodolites  10 
determine  the  precise  3D  coordinates  oi  (he  grid  vertices,  and  take 
a  picture  ot  the  grid.  A  second  pictum  with  the  grid  at  a  ditferen' 
distance  Irom  the  camera  then  provides  sufficient  information  to 
calculate  the  Geometric  projection  properties  ol  the  camera.  Of 


course,  such  a  projection  is  dependent  an  the  ien?  and  the  distance 
at  which  the  lens  is  focused,  as  discussed  above,  thus,  both 
positions  ot  the  grid  must  :.e  withm  the  depth  of  held  of  the  camera 
without  adjusting  the  locus  The  preosion  grid  v.iii  aisn  be  used  for 
aligning  cameras  with  each  other  and  with  the  rotation  axes  or.  'he 
Mobile  Camera  Platform. 

For  photometric  calibration  we  will  place  considerable  reliance 
on  the  data  supplied  with  our  nigh-precis'on  camera  Normally,  we 
will  use  that  camera  as  cur  worx.ng  standard  ‘or  evaluating  other 
cameras:  from  time  to  time  we  can  cnec»  on  tne  calibration  of  that 
camera  and  nave  it  recalibrated  .1  need  be  For  routine  photometric 
measurement,  we  hav’  slcp-retlecla'  ce  targets  and 
neutral-density  fillers  Vie  also  have  a  tew  color  charts  and  a 
set  ot  the  British  Ceramic  Colour  Standard  tiles,  a  set  ot  12 
Dies  4x4  inches  with  a  variety  of  colors  and  re ‘'ectances.  whose 
spectral  reflectance  curves  are  known  to  withm  1/2*V>  at  any 
wavelength.  We  have  a  hand  herd  luminance  meter  ( 1/3°  cone) 
and  a  hand  held  incident  colorimeter  (hem, spherical  incidence, 
cosme  weighted)  lor  routine  use 

We  also  have  on  hand  a  6  inch  diameter  Lambertian  dilluser 
and  a  pair  ol  2  inch  diameter  polarizers  The  polarizers  will  be 
needed  lor  studies  ol  the  eftect  of  polarization  on  pixel  values  of 
various  image  sensors,  we  suspect  that  this  effect  is  substantial 
near  the  edges  of  tne  image  The  polarizers  can  also  be  used  for 
studying  polarized  illumination  by  placing  them  in  front  ol  the  arc 
lamp. 

In  an  important  sense,  all  geometric  calculations  in  the  CIL  are 
only  as  accurate  as  the  levels  m  the  theodolites  For  this  reason, 
extra  care  was  taken  to  ensure  good  izvelmg  of  the  theodolites  on 
the  optical  table  and  ol  the  surface  of  the  optica'  table  itsetl 

6.  Status  and  Future  of  the  Lab 

6.1  Applications  of  the  Lab 

We  have  many  application  areas  m  mm.)  for  the  Calibrated 
imaging  Lab  bath  lur  implement, r,g  a/wl  u.alujbng  current 
ccmputei  vismn  themes  and  tor  developing  a  new  generation  ol 
more  sophisticated  theories  On.;  such  research  area  is  the  study 
ot  shadow  geometry  Ac  plan  to  impi-ment  e-usting  theories  Ol 
shadow  interpretation  in  a  pol.hec'ru!  eri  r-ronmeni  and  we  a re  now 
working  on  the  theory  tor  interpreting  ’fuzzy"  shadow  edges 
caused  hi  extended  (r.on  yuinn  "qht  sources  ismg  both  geometric 
and  possibly  photometric  methods  Tin;  Cil  provK.es  adequate 
facilities  for  generating  all  kinds  <f  shadow  problems  ,n  images, 
including  very  complex  situations  such  js  the  shadows  ot  moving 
ub-qts. 

Photometric  studies  and  color  analysis  studies  can  also  be 

earned  out  -n  me  CIL  .7e  would  ii*e  to  implement  standard 
reflectance  map  methods  a.  well  ij  enst-nq  m  -ia.es  for  color  and 
gloss  analysis  Then  we  ran  use  mesc  techniques  to  develop  new 
ones  tnal  deal  win  more  complexity  jnd  realism  m  images,  such 
non  triviai  background  illumination  jod  non  u.iilpim  scene 
illumination  These  studies  ma>  yield  in>-tht;J3  tor  identilying 
surface  nilc-nal  type  and  lump ” ..a  .!  as  surface  shape 

mloimation 

The  Mobile  Camera  Platform  m  the  C:L  wit!  be  the  basis  for 
gathering  high  prec-sinn  images  In’  hino'.u'.ir  and  mphi  ocular 
stereo  image  sets  as  well  as  motr-n  sequences  and  even 
motion  stereo  sequences  We  are  developing  analysis  methods 
fer  motion  sequences  that  we  tielieve  wi'l  y  .-Id  substantial  accuracy 
improvements  oyer  pre.Kjuslv  employed  theories,  and  precise 


ground  truth  data  win  be  important  m  carrying  out  such  a 
comparative  evaluation. 

Finally,  the  OIL  unit  provide  the  necessary  facilities  for  careto** 
measuring  camera  properties.  This  wiH  allow  a  means  for 
ototawhig  high  precision  images,  a  means  for  measuring  the 
accuracy  ol  individual  inexpensive  cameras  before  using  them  in 
the  field,  and  a  base  of  general  knowledge  about  cameras  for 
machine  vision  that  will  allow  more  informed  camera  purchases  in 
the  future. 

We  hope  that  researchers  outside  of  CMU  wifi  also  benefit  front 
the  existence  of  the  CtL.  possibly  by  coming  to  obtain  fvgfv 
precision  data  (which  wiB  not  be  a  trivial  process,  however). 

6.2  Anticipated  Difficulties  In  Using  the  Lab 

For  all  its  precision  and  flexMrty.  the  Calibrated  Imaging  Lab 
presents  some  practical  difficulties  to  deal  with.  Most  of  thaee 
concern  calibration,  such  as  sufficiently  careful  leveling  of  the 
optical  table  and  theadoMsa.  meesunog  tha  precise  poabon,  angle, 
and  illumination  distribution  of  kght  fixtures  (both  the  arc  lamp  and 
fixtures  m  the  track),  relatively  poor  control  over  the  diffuse 
illumination  from  the  drums,  ceikng,  and  floor,  and  the  problems 
inherent  m  the  use  of  the  theodolites  tor  position  measurement 
(time  and  minimum  focus  distance). 

The  safety  concerns  regarding  tha  arc  tamp  have  already  been 
noted. 

There  are  some  unresolved  technofogv  J  problems  that  vie  may 
deal  with  in  Tie  future.  Such  as  motorizing  the  MCP  controls. 
cbMming  color  images  with  a  parallel- output  color  camera,  and 
performing  rex-tune  motion  studies  which  would  requee  a  more 
sophisticated  digitization  mechanism.  We  also  have  an  ongoing 
schizophrenia  concerns  ,  the  use  o*  mctnc  or  English  units  While 
metric  units  are  easier  to  work  with,  much  ol  our  larger  scale 
equipment  (including  the  optics!  table  itself)  w  based  on  English 
unit*. 

Finally,  there  re  some  optical  problems  that  we  may  slmpfy  have 
to  kve  with,  such  as  the  narrow  depth  ot  fietd  inherent  m  smr> 
scale  imaging  and  me  movement  of  (he  perspective  center  of  a 
camera  when  the  lens  locus  *  changed. 

We  believe  all  of  these  problems  art  manageable,  and  that  the 
CIL  will  be  a  succes  iful  facility  in  spite  of  these  annoyances. 

6  3  Status  and  Timetable  or  Lab  Corstrucbon 

The  Calibrated  Imaging  Lab  has  been  descnbed  here  as  we 
envision  it  will  be  luimshed  when  complete  Much  of  ihe  equipment 
is  row  m  place,  but  the  Ui get  date  for  operation  10  commence  la 
January.  1968.  The  high  precision  camera  wW  not  be  available  until 
Apr-I  1966. 

«.4  Acknowledgement*  ® 

Taken  Kanade  has  been  involved  at  all  stages  in  plcvwtg  the 
Calmrated  Imaging  Lab.  We  are  grateful  to  Stuart  J  Smith  of  the 
A.  ar<d  8  Smith  Co..  Marry  Lichter  of  Lighting  Pittsburgh,  and 
Edward  J  Lesoon,  Jr  of  the  Ana  Carpet  and  Decorating  Co  .  for 
their  heip  m  ootaming  equipment  and  materials  lor  the  lab.  Thant-3 
also  lo  Jim  Skees  and  Dennis  Royse  lor  their  help  making  the 
necessary  arrangements  at  CMU. 


UNCLASSIFIED 


Technical  Report 
distributed  by 


Defense 

Technical 

Information 

Center 


DTIC. 


Acquiring  Information  - 
Imparting  Knowledge 


Defense  Logistics  Agency 

Defense  Technical  Information  Center 

Cameron  Station 


Alexandria,  Virginia  22304-6145 


UNCLASSIFIED 


UNCLASSIFIED 


NOTICE 

We  are  pleased  to  supply  this  document  in  response  to  your  request. 

The  acquisition  of  technical  reports,  notes,  memorandums,  etc.,  is  an  active, 
ongoing  program  at  the  Defense  Technical  Information  Center  (DTIC)  that  * 

depends,  in  part,  on  the  efforts  and  interest  of  users  and  contributors. 

Therefore,  if  you  know  of  the  existence  of  any  significant  reports,  etc.,  that  are 
not  in  the  DTIC  collection,  we  would  appreciate  receiving  copies  or  information 
related  to  their  sources  and  availability. 

The  appropriate  regulations  are  Department  of  Defense  Directive  3200.12,  DoD 
Scientific  and  Technical  Information  Program;  Department  of  Defense  Directive 
5230.24,  Distribution  Statements  on  Technical  Documents  ( amended  by  Secretary 
of  Defense  Memorandum,  18  Mar 1984,  subject:  Control  of  UnclassifiedTechnology 
with  Military  Application)',  American  National  Standard  Institute  (ANSI) 

Standard  Z39.18,  Scientific  and  Technical  Reports:  Organization,  Preparation, 
and  Production;  Department  of  Defense  5200. 1R,  Information  Security  Program 
Regulation. 

Our  Acquisition  Section,  DTIC-FDAB,  will  assist  in  resolving  any  questions  you 
may  have.  Telephone  numbers  of  that  office  are: 

(202)  274-6847,  (202)  274-6874  or  Autovon  284-6847,  284-6874. 

) 

DO  NOT  RETURN  THIS  DOCUMENT  TO  DTIC 


EACH  ACTIVITY  IS  RESPONSIBLE  FOR  DESTRUCTION  OF  THIS 
DOCUMENT  ACCORDING  TO  APPLICABLE  REGULATIONS. 


UNCLASSIFIED 


